Applied Scientist, Artificial General Intelligence

Amazon
USA, MA, Boston / USA, WA, BELLEVUE2026-03-20ONSITE

About the job

The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems.

Responsibilities

collaborate closely with core scientist team developing Amazon Nova models; lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows; design auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks; perform expert-level manual audits; conduct meta-audits to evaluate auditor performance; provide targeted coaching to uplift overall quality capabilities; develop and maintain LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment; set up the configuration of data collection workflows and communicate quality feedback to stakeholders; have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services; support quality solution design; conduct root cause analysis on data quality issues; research new auditing methodologies; find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards; work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice

Qualifications

Minimum

Master's degree in computer science, mathematics, statistics, machine learning or equivalent quantitative field; Experience programming in Java, C++, Python or related language; Experience with SQL and an RDBMS (e.g., Oracle) or Data Warehouse

Preferred

Experience implementing algorithms using both toolkits and self-developed code; Have publications at top-tier peer-reviewed conferences or journals