About the job
The Gemini Evals team is responsible for developing new ways to test and measure model performance. This is an open-ended problem that requires close collaboration between research and engineering and applying the state of the art AI/ML to conversational problems.
Responsibilities
Collaborate closely with the engineering team to support infrastructure development, meticulously track priorities, issues, deeply understand evaluation and experimentation results, coordinate alignment and, assessment with model teams.
Collaborate with data scientists to design and integrate model evals, execute evals, and conduct loss analysis to inform model quality.
Manage Gemini autorater platform and Gemini Eval area compute capacity planning.
Provide proactive status updates to stakeholders, and effectively triage autorater issues back to the modeling team.
Guide strategic goal and tactical delivery.
Lead with high agency.
Proactively identify workflow gaps and implement scalable improvements to optimize efficiency and output quality.
Qualifications
Minimum
Bachelor's degree in a technical field, or equivalent practical experience.
8 years of experience leading engineering projects across multiple geographies and time zones.
3 years of experience in people management.
Preferred
Experience in Large Language Model (LLM) evaluations, model training, model releases, or data science.
Ability to quickly learn and deeply understand the technical aspects of the programs from interface to infrastructure, serving, and customer issues, and drive technical discussions.
Excellent skills in managing complex stakeholder relationships.