About the job
We are seeking a highly motivated and analytical AI Experience Researcher to join our team. This role blends cognitive and human sciences, data sciences, systems design, and product evaluation to ensure AI-powered products deliver exceptional and intuitive customer experiences. You will work alongside a small but impactful team, collaborating with ML and data scientists, software engineers, designers, project managers, and other cross-functional teams at Apple to define success criteria for AI experiences, and create rigorous evaluations that measure these criteria in iterative product development cycles. If you're passionate about applying scientific rigor to real-world problems, thrive on innovation, and want your work to impact hundreds of millions of users, this role offers an exceptional opportunity to make a lasting contribution to products people use every day.
Responsibilities
Develop scalable automated evaluation methodologies by operationalizing complex multi-modal multi-turn AI experiences into observable and measurable metrics that work across diverse use cases, features, or product area
Produce comprehensive evaluation plans detailing evaluation scope, validation and data strategy, tooling requirements, resource allocation, and timelines
Derive experimental designs and write test instructions for LLM judges or for human raters
Define requirements for, or curate datasets that represent realistic usage; support data generation and annotation workflows to ensure coverage, quality, and alignment with product goals
Implement and analyze automated evaluations, maintaining rigor around reproducibility, identifying key insights, and areas for improvement across both qualitative and quantitative patterns
Prepare and present clear, concise, and impactful evaluation findings to diverse stakeholders, translating results into actionable recommendations for model training, ranking, and product decisions
Partner with engineers, QA, data scientists, designers, and product managers throughout the product development lifecycle to integrate evaluation insights and drive continuous improvement
Contribute to evolving human-centered AI evaluation methodologies and help to define best practices for AI experience evaluation as the field matures
Qualifications
Minimum
Advanced degree in Cognitive Psychology, Human-Computer Interaction (HCI), User Experience (UX) Research, Learning Sciences, Learning Analytics, Psychometrics, Applied Behavioral Science, or a related field with a focus on human cognition, behavior, and empirical evaluation
A strong data-driven mindset with experience designing and conducting rigorous empirical research or evaluation — including experimental design, data analysis, and interpretation of various qualitative and quantitative data — particularly in the context of complex human-system interactions
Ability to reason from theoretical grounding about what makes an experience good in a given context, and to translate that reasoning into evaluation frameworks and measurement designs
Demonstrated ability to operationalize research literature, qualitative user feedback, and quantitative behavioral data into actionable evaluation criteria, observable metrics, and product insights
Proficiency in data analysis and interpretation, with a strong understanding of statistical validity in evaluation contexts
Exceptional collaboration skills with a track record of working effectively in cross-functional teams that include engineering, ML, design, QA, leadership, and subject matter experts of diverse domains
Strong communication skills, with the ability to translate complex research findings and evaluation results into clear, actionable recommendations for both technical and non-technical audiences
Preferred
Familiarity with methods for capturing experiential quality beyond task success — such as cognitive interviews, think-aloud protocols, interaction analysis, or discourse and conversation analysis
Experience designing and implementing automated evaluation pipelines, including writing prompts for LLM judges and constructing human-in-the-loop or multi-turn evaluation setups
Experience working with multimodal or agentic systems, AI/ML models, preferably Large Language Models
Familiarity with automated testing frameworks and tooling
Experience with data generation and annotation workflows, including curating datasets, scenarios, and tasks that represent realistic usage
Portfolio demonstrating previous evaluation frameworks, research findings, or measurable contributions to product improvement
Background in learning sciences or instructional design, with experience reasoning about what makes a complex human experience effective is a plus