About the job
As a Data Scientist, you will play a critical role in evaluation and optimization for user-facing GenAI systems (such as text, image, video, 3D, 4D). You will define how we measure safety, responsibility, quality, and efficiency. You will combine annotation analysis, design of experiments, causal inference, model-based evaluation methods (such as LLM-as-a-judge), optimization algorithm, and AI models to drive product decisions and model improvements.
Responsibilities
Develop Evaluation Frameworks: Design and operationalize rigorous evaluation systems for either GenAI features (text, image, video, 3D, 4D). This includes eval experiment design, dataset design, label reliability analysis, and implementing and finetuning LLM-as-judge methods.
Run Rigorous Experiments: Conduct online experiments (A/B tests) and causal inference to quantify the impact of GenAI features. You will identify opportunities, measure lift, and ensure statistical rigor.
Define Success Metrics: Partner with cross-functional teams to define leading/lagging indicators for GenAI feature user satisfaction, business success, and safety.
Build Automated Systems: Research and apply state-of-the-art methodologies to build reproducible evaluation tooling that lift rigor and efficiency across the company.
Conduct Applied Research at the Frontier: Maintain an active pulse on the intersection of Gen AI and Data Science. You will innovate on methodology and techniques to solve unique business challenges while contributing to the broader field in the technical community.
Qualifications
Minimum
Possess or pursuing a PhD or equivalent in Statistics, Economics, Computer Science, Applied Math, Physics, Engineering, or a related quantitative field.
Technical Proficiency: Strong proficiency in SQL (Hive/Spark) for manipulating large datasets and scripting languages (Python or R) for analysis and modeling.
Experimentation and Causal Inference: A solid grounding in experimentation, causal inference, and statistical analysis, including test design and metric design for feature impact.
Problem Solving: A demonstrated track record of framing ambiguous problems, designing analytical approaches, and solving open-ended data science problems that drive business impact.
Learning Agility: Ability to effectively and responsibly use AI tools to enhance productivity and a passion for continuously improving methods in a fast-evolving field.
Preferred
GenAI Familiarity: Familiarity with GenAI models and safety/quality evaluation methods. Expertise in the model training lifecycle is a plus (e.g., fine-tuning, RLHF, or synthetic data generation).
Applied Research Background: A track record of applied research or publications in relevant technical fields is highly valued.