Lead Software Engineer - Remote · Optum / UnitedHealth Group

About the job

We are seeking a Lead Software Engineer to join the Claims Accumulator Hub team. The role involves managing development and support, including application deployments, production support, and implementing machine learning solutions such as generative AI to improve Priority 1 application support. Also, should be able to manage reports if needed. The position requires knowledge of both traditional software development life cycle methods and modern AI algorithms, proficiency in Java Full Stack, Kafka, React, Node.js, Azure and experience with the GenAI stack (including RAG and agents), expertise in AI evaluation tools and techniques, and an ongoing commitment to responsible use of AI within regulated healthcare environments.

Responsibilities

Lead and manage a high performing team of software engineers, providing coaching, mentorship, and fostering a collaborative culture; Partner with Product Owners and stakeholders to define scope, requirements, priorities, and delivery expectations; Develop and maintain project plans, technical roadmaps, and execution strategies for new features, enhancements, and platform initiatives; Oversee engineering activities end to end-from design and development through deployment, production operations, and continuous improvement along with being hands-on when needed; Drive Agile practices and ceremonies (Sprint planning, standups, backlog refinement, reviews) to ensure predictable, high-quality delivery; Lead incident response, root cause analysis, and postmortems; implement corrective actions to improve reliability and reduce recurrence; Architect, evolve, and manage observability platforms using Open Telemetry, Splunk, Grafana, or similar tools; Build and maintain automation frameworks and tools to reduce manual work, improve consistency, and increase operational efficiency; Ensure engineering excellence through robust vulnerability management, automated testing, solid code quality standards, and performance monitoring; Partner with SREs, platform teams, and global engineering groups to align architecture, priorities, and delivery outcomes; Identify opportunities to improve system reliability, performance, scalability, and operational maturity, lead implementation of improvements; Leverage AI-powered tools and platforms to enhance observability, incident response, and operational efficiency; Support cloud modernization initiatives, including on-prem to cloud migration strategies and implementation; Documentation: Create and maintain architectural documentation, including diagrams, technical specifications, and guidelines.

Qualifications

Minimum

Bachelor's degree in computer science, Engineering, or equivalent experience; 7+ years of experience in SDLC, SRE, DevOps, or related engineering domains; 3+ years of experience leading engineering or operations teams; Hands-on experience architecting and implementing observability solutions (Open Telemetry, Splunk, Grafana, Prometheus); Hands-on experience applying AI/GenAI capabilities-such as RAG, agents, or ML driven anomaly detection-to improve observability, incident response, or operational workflows; Deep understanding of systems architecture, cloud infrastructure, networking, security, and automation; Solid proficiency with CI/CD tools such as GitHub Actions, Jenkins, or Azure DevOps; Proficiency with scripting or programming languages (Python, Bash, PowerShell) for automation and tooling; Proven expertise with public cloud platforms (preferably Azure) and experience supporting or driving on prem to cloud migrations

Preferred

Leadership experience in globally distributed teams; Experience with claims preprocessing workflows, regulatory SLAs, HIPAA aligned operational controls, and transactions including 837, 835, 277, and 271; Experience building custom telemetry pipelines or integrating AI/ML for anomaly detection; Experience with IaC tools (Terraform, ARM/Bicep, Ansible); Experience working in highly regulated or enterprise scale environments; Proven solid background in performance engineering, capacity planning, or large-scale distributed systems; Demonstrated success driving engineering excellence initiatives (DevSecOps, automated testing frameworks, SRE best practices); Proven exposure to AI tools and their application in SRE workflows for faster delivery and smarter operations