Senior AI/ML Engineer - MLOps & Production AI Systems - Remote or Hybrid in MN/DC

About the job

As part of Optum AI, UnitedHealth Group's enterprise AI organization, you will build and scale production-grade machine learning and generative AI systems that directly impact patient outcomes, clinical efficiency, and enterprise automation. This team operates at the intersection of healthcare and cutting-edge AI-developing platforms and capabilities used across the enterprise.

Responsibilities

Design, build, and maintain end-to-end ML platforms and pipelines (training, validation, deployment, and monitoring)

Productionize ML models using batch and real-time inference architectures (APIs, streaming, event-driven systems)

Develop and manage ML lifecycle workflows using tools such as MLflow, Kubeflow, SageMaker, or Azure ML

Build and maintain CI/CD pipelines for ML (CI/CT/CD), including automated testing, validation, and model promotion

Containerize and deploy ML workloads using Docker and Kubernetes, ensuring scalability and reliability

Implement infrastructure-as-code (Terraform or equivalent) for reproducible and secure ML environments

Develop monitoring and observability solutions for model performance, drift, latency, and data quality

Automate retraining and redeployment workflows based on performance degradation or new data availability

Partner with cross-functional teams to define and enforce ML engineering standards and best practices

Ensure compliance with enterprise governance, security, and Responsible AI requirements

Qualifications

Minimum

Bachelor's degree in Computer Science, Engineering, or related field OR 4+ years of equivalent experience

5+ years of experience in ML Engineering / MLOps with production deployment of machine learning systems

3+ years of experience with ML lifecycle tools (MLflow, Kubeflow, SageMaker, Azure ML, or similar)

3+ years of experience with Docker and Kubernetes in production environments

3+ years of experience building CI/CD pipelines for ML using Git-based workflows and automation tools

2+ years of experience with cloud platforms (AWS, Azure, or GCP) for ML workloads

Experience with real-time and batch inference systems (e.g., Kafka, Kinesis, Event Hubs)

Solid programming experience in Python (5+ years) with ML frameworks (PyTorch, TensorFlow, or scikit-learn)

Preferred

7+ years of experience in ML engineering or distributed systems

Experience with feature stores (e.g., Feast) and data versioning systems

Hands-on experience with distributed data processing frameworks (Spark, Ray)

Experience with workflow orchestration tools (Airflow, Dagster, Prefect)

Experience with multi-cloud or hybrid cloud ML deployments

Knowledge of Responsible AI, bias detection, and model explainability techniques

Familiarity with observability tools (Prometheus, Grafana, OpenTelemetry)

Proven contributions to open-source ML or MLOps projects