Sr. Distinguished Machine Learning Engineer (Remote-Eligible)

About the job

At Capital One, we are creating responsible and reliable AI systems, changing banking for good. For years, Capital One has been an industry leader in using machine learning to create real-time, personalized customer experiences. Our investments in technology infrastructure and world-class talent — along with our deep experience in machine learning — position us to be at the forefront of enterprises leveraging AI. From informing customers about unusual charges to answering their questions in real time, our applications of AI & ML are bringing humanity and simplicity to banking. We are committed to continuing to build world-class applied science and engineering teams to deliver our industry leading capabilities with breakthrough product experiences and scalable, high-performance AI infrastructure. At Capital One, you will help bring the transformative power of emerging AI capabilities to reimagine how we serve our customers and businesses who have come to love the products and services we build.

Responsibilities

Define and drive technical strategy and roadmap for our Personalization Platform that powers real-time, personalized product experiences and multi-channel targeted user messaging across all Capital One products and services.

Partner cross-functionally with Product, Data science, Cloud infrastructure, and Machine learning platform teams to align on and co-develop the advanced recommendation systems and algorithms serving our Capital One users.

Develop and maintain a flexible, scalable rules engine to enable business-driven personalization logic, allowing dynamic configuration of user segmentation, targeting rules, and real-time decisioning while integrating seamlessly with ML-driven recommendations.

Design, build and maintain robust ML infrastructure and pipelines to support end-to-end workflows including feature extraction, model training, testing, guardrails, model evaluation, deployment, and both real-time and batch inference - ensuring high performance, scalability, and reliability.

Architect low-latency, event-driven systems for enabling real-time dynamic personalization and decisioning based on streaming data, user behavior, and contextual signals.

Drive the evolution of MLOps practices by building automated metrics-backed deployment workflows, integration validation and testing systems, and scalable monitoring & observability.

Invent and introduce state-of-the-art LLM optimization techniques to improve the performance — scalability, cost, latency, throughput — of large scale production AI systems.

Leverage a broad stack of Open Source and SaaS AI technologies such as AWS Ultraclusters, Huggingface, VectorDBs, Nemo Guardrails, PyTorch, and more.

Provide organizational technical leadership to influence architecture, engineering standards, cross-team strategies, mentoring engineers and driving organization wide platform innovation.

Qualifications

Minimum

Bachelor’s degree

At least 10 years of experience designing and building data-intensive solutions using distributed computing

At least 7 years of experience programming in C, C++, Python, or Scala

At least 4 years of experience with the full ML development lifecycle using modern technology in a business critical setting

Preferred

8+ years of experience deploying scalable, responsible AI solutions on major cloud platforms (AWS, GCP, Azure); Master's or PhD in Computer Science or a relevant technical field.

5+ years of proven expertise in designing, implementing and scaling personalization platform and recommendation systems serving one or more areas of Feed Personalization/Ads Ranking/Targeted Marketing Messaging.

5+ years of strong proficiency in Python, Java, C++, or Golang; hands-on experience with ML frameworks (PyTorch, TensorFlow) and orchestration tools (Databricks, Airflow, Kubeflow).

5+ years of experience developing and applying state-of-the-art techniques for optimizing training and inference systems to improve hardware utilization, latency, throughput, and cost.

5+ years of deep expertise in cloud-native engineering, containerization (Docker, Kubernetes), and automated CI/CD deployment.

Passion for staying on top of the latest AI research and AI systems, and judiciously apply novel techniques in production

Excellent communication and presentation skills, with the ability to articulate complex AI concepts to peers

Proven leadership in driving platform strategy, fostering cross-functional collaboration, and influencing technical direction across the company.