About the job
The Large Model Evaluation team is at the nexus of Waymo’s AI ambition. With advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs), Waymo is building state-of-the-art AI systems that handle the full complexity of real-world driving. At its core, our progress is defined by our ability to measure it. While robust evaluation is the bottleneck for deploying any large model, the challenge at Waymo is uniquely complex and safety-critical. We are looking for quantitatively-minded engineers to research and propose new ways to assess the ML models deployed in the Waymo Driver.
Responsibilities
Develop novel metrics and sampling techniques to measure the driving trajectories generated by ML models.
Employ creative simulation strategies to measure the driving performance of generative AI models. Identify potential edge cases, and provide reliable performance insights that inform model development and deployment.
Build data pipelines for signal discovery, data labeling, feature extraction and metric computation based on large-scale simulations.
Conduct data analysis to diagnose regressions in ML models.
Collaborate with world-class engineering and research teams that develop large-scale ML models.
Qualifications
Minimum
BS/MS/PhD in Computer Science, Machine Learning, Robotics, Statistics, Physics, Math or another quantitative area
Proficiency in programming in Python or C++
Knowledge of AI fundamentals, such as transformer architectures, distillation techniques, etc.
Demonstrated industry or research experience with creative problem solving and rigorous data analysis of open-ended quantitative problems
Preferred
Familiarity with one of the modern deep learning frameworks (e.g. JAX, Tensorflow, Pytorch)
Experience evaluating the quality of ML models