Senior Software Engineer, Large Model Evaluation

Waymo
Mountain View, California, USA / San Francisco, California, USA / Mountain View (US-MTV-EMF680), Mountain View, California, United States2025-12-17

About the job

The Large Model Evaluation team is at the nexus of Waymo’s AI ambition. With advancements in Large Language Models (LLMs) and Vision-Language Models (VLMs), Waymo is building state-of-the-art AI systems that handle the full complexity of real-world driving. At its core, our progress is defined by our ability to measure it. While robust evaluation is the bottleneck for deploying any large model, the challenge at Waymo is uniquely complex and safety-critical. We are looking for quantitatively-minded engineers to research and propose new ways to assess the ML models deployed in the Waymo Driver.

Responsibilities

Develop novel metrics and sampling techniques to measure the driving trajectories generated by ML models.

Employ creative simulation strategies to measure the driving performance of generative AI models. Identify potential edge cases, and provide reliable performance insights that inform model development and deployment.

Build data pipelines for signal discovery, data labeling, feature extraction and metric computation based on large-scale simulations.

Conduct data analysis to diagnose regressions in ML models.

Collaborate with world-class engineering and research teams that develop large-scale ML models.

Qualifications

Minimum

5+ years of relevant industry experience in a heavily quantitative software engineering area

Experience navigating complex technical and product landscapes, defining technical strategy, and creating roadmaps.

Software Engineering Fundamentals:

Proficiency in programming in Python or C++

Experience with software design principles, coding best practices, testing methodologies, and version control software.

Experience building software pipelines for data processing, system evaluation, or metric computation, in the context of large-scale systems.

Machine learning & Quantitative Experience

Knowledge of AI fundamentals, such as transformer architectures, distillation techniques, etc.

Experience evaluating the quality of ML models

Demonstrated experience taking quantitative findings through to productionized tools.

Preferred

Experience with simulation systems, robotics, or autonomous vehicles.

Familiarity with one of the modern deep learning frameworks (e.g. JAX, Tensorflow)