Principal Software Engineer, ML System Architect

About the job

Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver™—to improve access to mobility while saving thousands of lives now lost to traffic crashes. The Waymo Driver powers Waymo’s fully autonomous ride-hail service and can also be applied to a range of vehicle platforms and product use cases. The Waymo Driver has provided over ten million rider-only trips, enabled by its experience autonomously driving over 100 million miles on public roads and tens of billions in simulation across 15+ U.S. states. Waymo’s Systems Intelligence and ML team works with Research and Production teams to develop and deploy models that are core to our autonomous driving software. Waymo's AI is at the heart of this mission, and we are increasingly leveraging large-scale Foundation Models to unlock new capabilities for the Waymo Driver. Join Waymo to architect and build a unified, large-scale AI platform leveraging Google DeepMind's latest foundation models (like Gemini) for comprehensive world understanding and generation, to accelerate the development and distillation of models powering the world's most experienced driver.

Responsibilities

Architect ML Systems: Define and drive the technical roadmap for the platform, encompassing codebase unification, data pipelines, model architecture, training recipes, and evaluation frameworks.

Codebase Consolidation & Best Practices: Lead the unification of existing forked locations of foundation model component codebases into a production-hardened, shared repository. Establish and enforce rigorous coding standards, testing practices, and API designs to ensure long-term codebase health and developer velocity.

Google Deepmind Integration & API Definition: Serve as the primary technical interface between Waymo's offboard model development and Google Deepmind's core model and framework teams. Define clear APIs and integration patterns, ensuring Waymo can seamlessly leverage and contribute to Google Deepmind's advancements while maintaining stability and control.

Unify Core Components: Drive the consolidation of tokenization/de-tokenization strategies, data formats, input pipelines, and evaluation methodologies across all offboard Foundation Model use cases.

Scalable Training & Distillation: Architect for efficient large-scale distributed training (large scale) and establish a common, efficient distillation setup to transfer knowledge from large teacher models to onboard student models.

Technical Leadership & Influence: Provide technical mentorship, guidance, and direction to engineers across multiple teams within SIML and AI Foundations. Drive alignment on technical decisions with senior stakeholders across Waymo and Google Deepmind.

Drive Efficiency: Instill a culture of efficiency in model development, training, and resource utilization, aiming for high ML Productivity.

Qualifications

Minimum

Master's degree or PhD in Computer Science or a related field.

12+ years of experience in software engineering, with at least 8+ years focused on large-scale machine learning systems, deep learning frameworks, and AI infrastructure.

A track record of architecting and delivering complex, high-impact ML platforms or models.

Deep expertise in Python, C++, and ML frameworks like JAX and TensorFlow.

Extensive experience with large-scale distributed training on TPUs/GPUs and associated challenges.

Demonstrated ability to design robust, scalable, and maintainable software architectures and APIs.

Understanding of data pipelines, storage systems, and tokenization techniques.

Experience working effectively with research and product teams, and influencing across organizational boundaries.

Technical leadership skills, with the ability to drive strategy, influence across teams, and mentor other engineers.

Communication skills, with the ability to articulate complex technical vision and drive alignment, capable of conveying complex technical ideas clearly.

Preferred

Experience with multimodal and generative models.

Experience in autonomous vehicle systems or robotics.

Contributions to open-source ML frameworks or widely used internal tools.

Experience with simulation systems.