Fortytwo
Senior MLOps Engineer
NEWRemoteFull-timeGlobal
š Midš Remote
RemoteRemote work position availableActivePosted within the last 30 days
Job Description
[AI-summarized by JobStash]
You will deploy and maintain production ML infrastructure, optimize GPU utilization, and serve large and small language models. You will build CI/CD pipelines, create Helm templates for Kubernetes deployments, implement model optimization and serving workflows, and set up monitoring, logging, and automated workflows to ensure reliable model delivery.
Requirements
- āBachelor's or Master's degree in Computer Science Engineering or related field
- āProficiency in Kubernetes Helm and containerization technologies
- āExperience with GPU optimization including MIG and NOS
- āExperience with cloud platforms such as AWS GCP and Azure
- āKnowledge of monitoring tools such as Grafana and Prometheus
- āProficiency in scripting languages Python and Bash
- āHands-on experience with CI/CD tools and workflow management systems
- āFamiliarity with Triton Inference Server ONNX and TensorRT
Responsibilities
- āDeploy scalable production-ready ML services with optimized infrastructure
- āManage and autoscale Kubernetes clusters
- āOptimize GPU resources using MIG and NOS
- āManage cloud storage to ensure high availability and performance
- āIntegrate LoRA and model merging workflows
- āAdapt and deploy state-of-the-art ML codebases
- āDeploy and manage LLMs SLMs and LMMs
- āServe models using Triton Inference Server and other serving frameworks
- āLeverage vLLM and TGI for model serving
- āOptimize models with ONNX and TensorRT
- āDevelop Retrieval-Augmented Generation systems
- āSet up monitoring and logging with Grafana Prometheus Loki Elasticsearch and OpenSearch
- āWrite and maintain CI/CD pipelines using GitHub Actions
- āCreate Helm templates for rapid Kubernetes node deployment
- āAutomate workflows using cron jobs and Airflow DAGs
Tech Stack
NOSGitHub ActionsONNXfine-tuningmonitoringGPULokiS3model servingLMM