About the job
Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. The Simulation ML Infrastructure team builds scalable AI/ML infrastructure to accelerate the Simulator team in sustainably innovating and building state of the art simulations of realistic environments for the testing and training of the Waymo Driver. We seek an experienced senior IC to lead the development of advanced AI/ML infrastructure for multi-billion parameter foundation models in ML accelerator-friendly simulations.
Responsibilities
Be part of a world-class, high-performing research engineering team to advance the state of the art of ultra realistic multi-agent simulations using foundation models.
Collaborate closely with the core Waymo Realism Modeling team in London and Waymo Oxford to use large foundation models to improve sim realism.
Work at the intersection of data engineering, model development, and simulations, and drive architectural decisions. Own large, complex systems, driving architectures and designs that meet technical and business objectives.
Design and scale large distributed systems covering the ML lifecycle, supporting planet-scale dataset generation, model training, and evaluation.
Collaborate cross-functionally to derive performance and system-level requirements for large ML systems. Translate product/business goals into measurable technical deliverables, ensuring system component alignment.
Qualifications
Minimum
No minimum qualifications listed.
Preferred
5+ years of professional software engineering experience, with at least 3 years in machine learning infrastructure such as developing, designing, scaling, training, deploying, and optimizing large-scale machine learning systems from data to model.
Solid experience in the development and optimization of machine learning infrastructure tools like DeepSpeed, PyTorch, TensorFlow, Ray, or similar frameworks.
Strong understanding of state-of-the-art machine learning models and algorithms such as autoregressive transformers and familiarity scaling large models across ML accelerator profiling tools to uncover performance bottlenecks.
Strong leadership skills with experience driving ambiguous problems end-to-end, with a willingness and independence to pick up whatever knowledge to get the job done. Passionate about building infrastructure, libraries, tools, and pipelines for engineers and scientists.
Excellent communication skills, both verbal and written, with the ability to translate complex technical concepts for a broad audience.
Practical familiarity in Autonomous Driving, Simulations, and ML accelerators is a plus.