AI Research Scientist (Technical Leadership), Data Research

About the job

Meta is seeking research scientists to help us build the data foundation for Meta's most advanced Large Language and Media Models. We're looking for researchers with LLM expertise to join us on working with data at scale and to push beyond the data ceiling.

Responsibilities

Collaborate with cross-functional teams to develop Meta’s next foundational models

Advance our understanding of data research, such as how to overcome data walls and how best to create synthetic data

Architect efficient and scalable data curation systems and pipelines

Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling

Execute on high priority projects in pre-training, mid-training, or post-training data curation

Apply specialized expertise in video/image generation, video/image perception, OCR, agentic data, synthetic data, reasoning data, web parser, coding data, data scaling laws, or datamix optimization

Lead complex technical projects end-to-end

Qualifications

Minimum

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

PhD in Computer Science or a related technical field

4+ years of industry research experience in NLP or CV

4+ years as a formal technical lead experience

Experience leading major technical initiatives with cross-functional impact and influencing strategy across multiple teams

Practical experience with multimodal pre-training or mid-training data curation for large language models, media perception, or media generation models

Published research in leading peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV) and/or demonstrated significant industry influence in the field of AI

Preferred

Experience working on frontier-quality/ state-of-the-art Large Language or Large Media Models

First-author publications at top peer-reviewed conferences (e.g., ACL, NeurIPS, ICML, ICLR, AAAI, KDD, CVPR, ICCV)

Programming experience in Python and hands-on experience with frameworks like PyTorch or Spark, or related distributed computing frameworks (Ray, DataFlow)