Sr. Director, Machine Learning Engineering (Remote-Eligible)

Capital One
Remote / McLean, VA, USA2026-04-07Full time

About the job

Sr. Director, Machine Learning Engineering (Remote-Eligible) At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry leader in using machine learning to create real-time, personalized customer experiences. Our investments in technology infrastructure and world-class talent — along with

Responsibilities

Lead and scale a high-performing engineering organization responsible for the Personalization Platform that powers real-time, personalized product experiences and multi-channel targeted user messaging across Capital One products and services.

Define the technical strategy, delivery roadmap, and operating model for a portfolio spanning recommendation systems, ranking, decisioning, GenAI infrastructure, MLOps, and low-latency application-serving systems

Build, develop, and manage engineers and engineering leaders; set a high bar for hiring, performance, talent density, coaching, and succession planning across the organization

Partner cross-functionally with Product, Data Science, Cloud Infrastructure, and Machine Learning Platform teams to align strategy, prioritize investments, and co-develop advanced recommendation systems and algorithms serving Capital One users

Drive the design, buildout, and operation of robust ML infrastructure and pipelines supporting feature extraction, model training, testing, guardrails, evaluation, deployment, and both real-time and batch inference with strong reliability, scalability, and operational rigor

Architect low-latency, event-driven systems for real-time personalization and decisioning based on streaming data, user behavior, and contextual signals

Drive the evolution of MLOps practices through automated, metrics-backed deployment workflows, validation and testing systems, model lifecycle governance, and scalable observability

Guide the adoption of state-of-the-art AI and LLM optimization techniques to improve scalability, cost, latency, throughput, and reliability of large-scale production AI systems

Provide organizational technical and people leadership by influencing architecture, engineering standards, delivery excellence, incident management, and cross-team strategy while mentoring managers, tech leads, and senior engineers.

Make high judgment build-vs-buy decisions across a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more.

Attract and retain top talent in the AI industry and nurture personal and professional development for your team. Foster a culture of learning and staying abreast of the state-of-the-art in AI.

Qualifications

Minimum

Bachelor's degree in Computer Science, Engineering, or AI plus at least 10 years of experience developing or leading AI and ML algorithms or technologies, or Master's degree plus at least 8 years of experience developing or leading AI and ML algorithms or technologies

At least 5 years of people leadership experience

Preferred

7 years of experience managing and leading an engineering team

8+ years of experience deploying scalable, responsible AI solutions on major cloud platforms (AWS, GCP, Azure)

Master’s or PhD in Computer Science or a relevant technical field

Proven expertise designing, implementing, and scaling personalization platforms and recommendation systems across feed personalization, ads ranking, or targeted marketing messaging

Proficiency in Python, Java, C++, or Golang; hands-on experience with ML frameworks (PyTorch, TensorFlow) and orchestration tools (Databricks, Airflow, Kubeflow)

Experience optimizing large-scale training and inference systems for hardware utilization, latency, throughput, and cost

Deep expertise in cloud-native engineering, containerization (Docker, Kubernetes), and automated CI/CD deployment

Deep experience with MLOps, model observability, and production ML lifecycle management

Strong track record building organizations, developing managers and senior engineers, and leading through scale and ambiguity

Excellent communication and presentation skills, with the ability to influence senior stakeholders and articulate complex AI concepts clearly

Proven leadership in driving platform strategy, cross-functional execution, and technical direction across a large organization

Excellent communication and presentation skills, with the ability to articulate complex AI concepts to peers