Senior Software Engineer, Simulator Evaluation

About the job

Waymo is an autonomous driving technology company with the mission to be the world's most trusted driver. Since its start as the Google Self-Driving Car Project in 2009, Waymo has focused on building the Waymo Driver—The World's Most Experienced Driver™—to improve access to mobility while saving thousands of lives now lost to traffic crashes. The Waymo Driver powers Waymo’s fully autonomous ride-hail service and can also be applied to a range of vehicle platforms and product use cases. The Waymo Driver has provided over ten million rider-only trips, enabled by its experience autonomously driving over 100 million miles on public roads and tens of billions in simulation across 15+ U.S. states. Waymo’s simulator is one of the most complex virtual environments ever built. It blends deterministic logic, physical dynamics, and state-of-the-art Generative AI to create a training ground for the Waymo Driver. The Simulator Evaluation team faces the ultimate data challenge: How do you mathematically prove that a virtual world is 'real'?

Responsibilities

Architect the Eval Rubrik: You will develop novel methodologies to evaluate the simulator across the stack. You will distinguish between true driving challenges and realism artifacts—whether it’s a logic gap, a physics glitch, or a model hallucination.

Build at Scale: You will design and implement high-throughput pipelines (C++/Python) capable of processing massive datasets of simulation logs. You will turn raw, noisy data into clear, actionable signals.

The 'Critic' for the System: You will partner closely with AI research and other simulation teams, as the eval workflows you build will drive rapid innovation and research roadmaps.

Strategic Leadership: You will navigate ambiguity to determine what matters most for realism. You will lead the strategy for specific domains, ensuring our evaluation evolves as fast as our simulation technology.

Qualifications

Minimum

5+ years of software development experience.

Proficiency in Python or C++, with experience building scalable data processing systems or evaluation frameworks.

Strong software design principles: you write clean, testable code that is built to last.

A 'Data Detective' mindset: You can look at a distribution of outcomes and intuitively spot anomalies, selection bias, or system errors.

Experience designing and implementing evaluation frameworks for complex systems or machine learning models.

Comfort working with complex, hybrid systems. You understand how to evaluate different types of 'black boxes,' whether they are heuristic-based, physics-based, or learned models.

Preferred

Background in fields that blend code, math, and simulation: Autonomous Vehicles, Algorithmic Trading, AdTech/Search Ranking, Machine Learning, or Robotics.

Experience with SQL and the Python data stack (Pandas, NumPy, SciPy).

Familiarity with evaluating Generative AI / LLMs or experience with agent-based modeling and behavioral logic.

Experience taking a metric from 'research concept' to 'production pipeline.'