About the job
We are seeking a Data Scientist II to join our Quick Data team, focusing on evaluation and benchmarking data development for Quick Suite features. Our mission is to engineer high-quality datasets that are essential to the success of Amazon Quick Suite. From human evaluations and Responsible AI safeguards to Retrieval-Augmented Generation and beyond, our work ensures that Generative AI is enterprise-ready, safe, and effective for users at scale. As part of our diverse team—including data scientists, engineers, language engineers, linguists, and program managers—you will collaborate closely with science, engineering, and product teams. We are driven by customer obsession and a commitment to excellence.
Responsibilities
Design and develop comprehensive evaluation and benchmarking datasets for Quick Suite AI-powered features
Leverage LLMs for synthetic data corpora generation; data evaluation and quality assessment using LLM-as-a-judge settings
Create ground truth datasets with high-quality question-answer pairs across diverse domains and use cases
Lead human annotation initiatives and model evaluation audits to ensure data quality and relevance
Develop and refine annotation guidelines and quality frameworks for evaluation tasks
Conduct statistical analysis to measure model performance, identify failure patterns, and guide improvement strategies
Collaborate with ML scientists and engineers to translate evaluation insights into actionable product improvements
Build scalable data pipelines and tools to support continuous evaluation and benchmarking efforts
Contribute to Responsible AI initiatives by developing safety and fairness evaluation datasets
Qualifications
Minimum
2+ years of data scientist experience
3+ years of data querying languages (e.g. SQL), scripting languages (e.g. Python) or statistical/mathematical software (e.g. R, SAS, Matlab, etc.) experience
3+ years of machine learning/statistical modeling data analysis tools and techniques, and parameters that affect their performance experience
1+ years of working with or evaluating AI systems experience
1+ years of creating or contributing to mathematical textbooks, research papers, or educational content experience
Master's degree in Science, Technology, Engineering, or Mathematics (STEM), or experience working in Science, Technology, Engineering, or Mathematics (STEM)
Experience applying theoretical models in an applied environment
Preferred
Ph.D. in Science, Technology, Engineering, or Mathematics (STEM)
Knowledge of machine learning concepts and their application to reasoning and problem-solving
Experience in a ML or data scientist role with a large technology company
Experience in defining and creating benchmarks for assessing GenAI model performance
Experience working on multi-team, cross-disciplinary projects
Experience applying quantitative analysis to solve business problems and making data-driven business decisions
Experience effectively communicating complex concepts through written and verbal communication