Engineering Analyst, Trust and Safety, RAI Novel Testing

About the job

Novel Testing is a team within Trust and Safety specializing in complex testing, defining protocols and methodologies for assessing risk where best practices do not currently exist. We pioneer and scale innovative testing programs, streamlining the launch of trustworthy, novel, responsible AI (RAI) products. Work spans from designing first-of-their-kind evaluations for Google’s most ambitious product bets – including autonomous agents, personalization, and the latest hardware – to developing new methodologies for assessing novel foundational model capabilities as they emerge. Advancing the state-of-the-art in AI evaluation is central to this mission.

Responsibilities

Drive the methodological frontier of model evaluation.

Partner with Google DeepMind to develop novel data-driven methodologies for the structured and unstructured testing of emerging AI products and model capabilities.

Move beyond standard benchmarks, designing sophisticated experimental frameworks and uncovering latent model behaviors and capabilities.

Define testing and safety standards, working with cross-functional colleagues, policy, and engineering, to ensure they are met.

Perform analyses and drive insights to develop model-level and product-level safety mitigations.

Lead and influence cross-functional teams to implement safety initiatives.

Act as an advisor to executive leadership on complex safety issues.

Represent Google's AI safety efforts in external forums and collaborations, contributing to industry-wide best practices.

Mentor analysts, fostering a culture of excellence and acting as a subject matter expert on adversarial techniques.

Work with graphic, controversial, or upsetting content.

Qualifications

Minimum

Bachelor's degree or equivalent practical experience.

7 years of experience in managing projects and defining project scope, goals, and deliverables.

7 years of experience in data analysis or data science, including identifying trends, generating summary statistics, and drawing insights from quantitative and qualitative data.

5 years of experience in data analysis with experience in SQL or Python.

Preferred

Master's degree or PhD in a relevant quantitative or engineering field.

5 years of experience working in trust and safety operations, data analytics, cybersecurity, or other relevant environment.

Experience working with large language models, LLM operations, prompt engineering, pre-training, and fine-tuning.

Experience in designing and conducting experiments or quantitative research in a technology or AI context.

Experience in AI systems, machine learning, and their potential risks.

Strong technical competency with a data-driven investigative approach to solve complex tests, including proficiency in data manipulation, analysis, and automation using languages like Python and SQL.