Manifold Labs (Targon)

AI Inference Engineer

NEW

RemoteFull-timeGlobal

RemoteRemote work position availableActivePosted within the last 30 days

Job Description

[AI-summarized by JobStash]

You will optimize the latency and throughput of model inference, design and build reliable production serving systems, and accelerate research on scaling test-time compute. You will implement batching, caching, load balancing, and model parallelism; develop low-level GPU kernels and code generation; apply algorithmic optimizations such as quantization, distillation, and speculative decoding; and test, benchmark, and improve inference reliability for large-scale, high-concurrency deployments.

Requirements

●Experience with system optimizations for model serving, including batching, caching, load balancing, and model parallelism
●Experience with low-level inference optimizations such as GPU kernels and code generation
●Experience with algorithmic inference optimizations such as quantization, distillation, and speculative decoding
●Experience with large-scale, high-concurrency production serving
●Experience with testing, benchmarking, and reliability of inference services

Responsibilities

●Optimize model inference latency and throughput
●Build reliable production serving systems
●Accelerate research on scaling test-time compute
●Implement batching, caching, and load balancing for model serving
●Develop model parallelism and low-level GPU kernel optimizations
●Implement code generation for inference
●Apply algorithmic optimizations such as quantization, distillation, and speculative decoding
●Test, benchmark, and improve inference service reliability for high-concurrency deployments

Tech Stack

benchmarkingKubernetesCUDAcomputecachingquantizationverifiable inferencereliabilityInferenceblockchain integration