Manifold Labs (Targon)
AI Inference Engineer
NEWRemoteFull-timeGlobal
RemoteRemote work position availableActivePosted within the last 30 days
Job Description
[AI-summarized by JobStash]
You will optimize the latency and throughput of model inference, design and build reliable production serving systems, and accelerate research on scaling test-time compute. You will implement batching, caching, load balancing, and model parallelism; develop low-level GPU kernels and code generation; apply algorithmic optimizations such as quantization, distillation, and speculative decoding; and test, benchmark, and improve inference reliability for large-scale, high-concurrency deployments.
Requirements
- ●Experience with system optimizations for model serving, including batching, caching, load balancing, and model parallelism
- ●Experience with low-level inference optimizations such as GPU kernels and code generation
- ●Experience with algorithmic inference optimizations such as quantization, distillation, and speculative decoding
- ●Experience with large-scale, high-concurrency production serving
- ●Experience with testing, benchmarking, and reliability of inference services
Responsibilities
- ●Optimize model inference latency and throughput
- ●Build reliable production serving systems
- ●Accelerate research on scaling test-time compute
- ●Implement batching, caching, and load balancing for model serving
- ●Develop model parallelism and low-level GPU kernel optimizations
- ●Implement code generation for inference
- ●Apply algorithmic optimizations such as quantization, distillation, and speculative decoding
- ●Test, benchmark, and improve inference service reliability for high-concurrency deployments
Tech Stack
benchmarkingKubernetesCUDAcomputecachingquantizationverifiable inferencereliabilityInferenceblockchain integration