About the job
Sr. Director, Machine Learning Engineering (Remote-Eligible) At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry leader in using machine learning to create real-time, personalized customer experiences. Our investments in technology infrastructure and world-class talent — along with
Responsibilities
Lead and scale a high-performing engineering organization responsible for the Personalization Platform that powers real-time, personalized product experiences and multi-channel targeted user messaging across Capital One products and services.
Define the technical strategy, delivery roadmap, and operating model for a portfolio spanning recommendation systems, ranking, decisioning, GenAI infrastructure, MLOps, and low-latency application-serving systems
Build, develop, and manage engineers and engineering leaders; set a high bar for hiring, performance, talent density, coaching, and succession planning across the organization
Partner cross-functionally with Product, Data Science, Cloud Infrastructure, and Machine Learning Platform teams to align strategy, prioritize investments, and co-develop advanced recommendation systems and algorithms serving Capital One users
Drive the design, buildout, and operation of robust ML infrastructure and pipelines supporting feature extraction, model training, testing, guardrails, evaluation, deployment, and both real-time and batch inference with strong reliability, scalability, and operational rigor
Architect low-latency, event-driven systems for real-time personalization and decisioning based on streaming data, user behavior, and contextual signals
Drive the evolution of MLOps practices through automated, metrics-backed deployment workflows, validation and testing systems, model lifecycle governance, and scalable observability
Guide the adoption of state-of-the-art AI and LLM optimization techniques to improve scalability, cost, latency, throughput, and reliability of large-scale production AI systems
Provide organizational technical and people leadership by influencing architecture, engineering standards, delivery excellence, incident management, and cross-team strategy while mentoring managers, tech leads, and senior engineers.
Make high judgment build-vs-buy decisions across a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more.
Attract and retain top talent in the AI industry and nurture personal and professional development for your team. Foster a culture of learning and staying abreast of the state-of-the-art in AI.
Qualifications
Minimum
Bachelor's degree in Computer Science, Engineering, or AI plus at least 10 years of experience developing or leading AI and ML algorithms or technologies, or Master's degree plus at least 8 years of experience developing or leading AI and ML algorithms or technologies
At least 5 years of people leadership experience
Preferred
7 years of experience managing and leading an engineering team
8+ years of experience deploying scalable, responsible AI solutions on major cloud platforms (AWS, GCP, Azure)
Master’s or PhD in Computer Science or a relevant technical field
Proven expertise designing, implementing, and scaling personalization platforms and recommendation systems across feed personalization, ads ranking, or targeted marketing messaging
Proficiency in Python, Java, C++, or Golang; hands-on experience with ML frameworks (PyTorch, TensorFlow) and orchestration tools (Databricks, Airflow, Kubeflow)
Experience optimizing large-scale training and inference systems for hardware utilization, latency, throughput, and cost
Deep expertise in cloud-native engineering, containerization (Docker, Kubernetes), and automated CI/CD deployment
Deep experience with MLOps, model observability, and production ML lifecycle management
Strong track record building organizations, developing managers and senior engineers, and leading through scale and ambiguity
Excellent communication and presentation skills, with the ability to influence senior stakeholders and articulate complex AI concepts clearly
Proven leadership in driving platform strategy, cross-functional execution, and technical direction across a large organization
Excellent communication and presentation skills, with the ability to articulate complex AI concepts to peers