Data Scientist, AWS Quick Data

About the job

We are seeking a Data Scientist II to join our Quick Data team, focusing on evaluation and benchmarking data development for Quick Suite features. Our mission is to engineer high-quality datasets that are essential to the success of Amazon Quick Suite. From human evaluations and Responsible AI safeguards to Retrieval-Augmented Generation and beyond, our work ensures that Generative AI is enterprise-ready, safe, and effective for users at scale. As part of our diverse team—including data scientists, engineers, language engineers, linguists, and program managers—you will collaborate closely with science, engineering, and product teams. We are driven by customer obsession and a commitment to excellence.

Responsibilities

Design and develop comprehensive evaluation and benchmarking datasets for Quick Suite AI-powered features

Leverage LLMs for synthetic data corpora generation; data evaluation and quality assessment using LLM-as-a-judge settings

Create ground truth datasets with high-quality question-answer pairs across diverse domains and use cases

Lead human annotation initiatives and model evaluation audits to ensure data quality and relevance

Develop and refine annotation guidelines and quality frameworks for evaluation tasks

Conduct statistical analysis to measure model performance, identify failure patterns, and guide improvement strategies

Collaborate with ML scientists and engineers to translate evaluation insights into actionable product improvements

Build scalable data pipelines and tools to support continuous evaluation and benchmarking efforts

Contribute to Responsible AI initiatives by developing safety and fairness evaluation datasets

Qualifications

Minimum

2+ years of data scientist experience

3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience

3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience

1+ years of working with or evaluating AI systems experience

1+ years of creating or contributing to mathematical textbooks, research papers, or educational content experience

Master's degree in Science, Technology, Engineering, or Mathematics (STEM), or experience working in Science, Technology, Engineering, or Mathematics (STEM)

Experience applying theoretical models in an applied environment

Preferred

Ph.D. in Science, Technology, Engineering, or Mathematics (STEM)

Knowledge of machine learning concepts and their application to reasoning and problem-solving

Experience in a ML or data scientist role with a large technology company

Experience in defining and creating benchmarks for assessing GenAI model performance

Experience working on multi-team, cross-disciplinary projects

Experience applying quantitative analysis to solve business problems and making data-driven business decisions

Experience effectively communicating complex concepts through written and verbal communication