About the job
The Artificial General Intelligence (AGI) team is seeking a dedicated, skilled, and innovative Applied Scientist with a robust background in machine learning, statistics, quality assurance, auditing methodologies, and automated evaluation systems to ensure the highest standards of data quality, to build industry-leading technology with Large Language Models (LLMs) and multimodal systems.
Responsibilities
lead the development of comprehensive quality strategies and auditing frameworks that safeguard the integrity of data collection workflows; designing auditing strategies with detailed SOPs, quality metrics, and sampling methodologies that help Nova improve performances on benchmarks; perform expert-level manual audits, conduct meta-audits to evaluate auditor performance, and provide targeted coaching to uplift overall quality capabilities; developing and maintaining LLM-as-a-Judge systems, including designing judge architectures, creating evaluation rubrics, and building machine learning models for automated quality assessment; set up the configuration of data collection workflows and communicate quality feedback to stakeholders; have a direct impact on enhancing customer experiences through high-quality training and evaluation data that powers state-of-the-art LLM products and services; support quality solution design, conduct root cause analysis on data quality issues, research new auditing methodologies, and find innovative ways of optimizing data quality while setting examples for the team on quality assurance best practices and standards; work closely with talented engineers, domain experts, and vendor teams to put quality strategies and automated judging systems into practice
Qualifications
Minimum
Master's degree in computer science, mathematics, statistics, machine learning or equivalent quantitative field; Experience programming in Java, C++, Python or related language; Experience with SQL and an RDBMS (e.g., Oracle) or Data Warehouse
Preferred
Experience implementing algorithms using both toolkits and self-developed code; Have publications at top-tier peer-reviewed conferences or journals