Senior Software Engineer, Simulator Evaluation

Waymo
Mountain View, CA, USA / San Francisco, CA, USA / Mountain View (US-MTV-EMF680), Mountain View, California, United States2025-03-06

About the job

Waymo’s simulator is one of the most complex virtual environments ever built. It blends deterministic logic, physical dynamics, and state-of-the-art Generative AI to create a training ground for the Waymo Driver. The Simulator Evaluation team faces the ultimate data challenge: How do you mathematically prove that a virtual world is 'real'? We are looking for a Senior Software Engineer to build the metrics and systems that grade this hybrid environment. You will work at the intersection of software engineering and AI, ensuring that our simulated worlds—whether driven by explicit rules or foundation models—provide a trustworthy representation of reality.

Responsibilities

Architect the Eval Rubrik: You will develop novel methodologies to evaluate the simulator across the stack. You will distinguish between true driving challenges and realism artifacts—whether it’s a logic gap, a physics glitch, or a model hallucination.

Build at Scale: You will design and implement high-throughput pipelines (C++/Python) capable of processing massive datasets of simulation logs. You will turn raw, noisy data into clear, actionable signals.

The 'Critic' for the System: You will partner closely with AI research and other simulation teams, as the eval workflows you build will drive rapid innovation and research roadmaps.

Strategic Leadership: You will navigate ambiguity to determine what matters most for realism. You will lead the strategy for specific domains, ensuring our evaluation evolves as fast as our simulation technology.

Qualifications

Minimum

5+ years of software development experience.

Proficiency in Python or C++, with experience building scalable data processing systems or evaluation frameworks.

Strong software design principles: you write clean, testable code that is built to last.

Preferred

Background in fields that blend code, math, and simulation: Autonomous Vehicles, Algorithmic Trading, AdTech/Search Ranking, Machine Learning, or Robotics.

Experience with SQL and the Python data stack (Pandas, NumPy, SciPy).

Familiarity with evaluating Generative AI / LLMs or experience with agent-based modeling and behavioral logic.

Experience taking a metric from 'research concept' to 'production pipeline.'