Fortytwo

Senior MLOps Engineer

NEW

RemoteFull-timeGlobal

📊 Mid🏠 Remote

RemoteRemote work position availableActivePosted within the last 30 days

Job Description

[AI-summarized by JobStash]

You will deploy and maintain production ML infrastructure, optimize GPU utilization, and serve large and small language models. You will build CI/CD pipelines, create Helm templates for Kubernetes deployments, implement model optimization and serving workflows, and set up monitoring, logging, and automated workflows to ensure reliable model delivery.

Requirements

●Bachelor's or Master's degree in Computer Science Engineering or related field
●Proficiency in Kubernetes Helm and containerization technologies
●Experience with GPU optimization including MIG and NOS
●Experience with cloud platforms such as AWS GCP and Azure
●Knowledge of monitoring tools such as Grafana and Prometheus
●Proficiency in scripting languages Python and Bash
●Hands-on experience with CI/CD tools and workflow management systems
●Familiarity with Triton Inference Server ONNX and TensorRT

Responsibilities

●Deploy scalable production-ready ML services with optimized infrastructure
●Manage and autoscale Kubernetes clusters
●Optimize GPU resources using MIG and NOS
●Manage cloud storage to ensure high availability and performance
●Integrate LoRA and model merging workflows
●Adapt and deploy state-of-the-art ML codebases
●Deploy and manage LLMs SLMs and LMMs
●Serve models using Triton Inference Server and other serving frameworks
●Leverage vLLM and TGI for model serving
●Optimize models with ONNX and TensorRT
●Develop Retrieval-Augmented Generation systems
●Set up monitoring and logging with Grafana Prometheus Loki Elasticsearch and OpenSearch
●Write and maintain CI/CD pipelines using GitHub Actions
●Create Helm templates for rapid Kubernetes node deployment
●Automate workflows using cron jobs and Airflow DAGs

Tech Stack

NOSGitHub ActionsONNXfine-tuningmonitoringGPULokiS3model servingLMM