Research Engineer, Text Data Research - MSL FAIR

Meta
Menlo Park, CA

About the job

Meta is seeking AI research engineers to help us build the data foundation for Meta's most advanced Large Language Models. We're looking for engineers with LLM expertise to join us on working with data at scale and to push beyond the data ceiling.

Responsibilities

Collaborate with cross-functional teams to develop Meta’s next foundational models

Architect efficient and scalable data curation systems and pipelines

Fundamentally improve our data velocity across workflows and projects by contributing to the advancement of data tooling

Execute on high priority projects in pre-training, mid-training, or post-training data curation

Apply specialized expertise in agentic data, synthetic data, reasoning data, web parser, coding data, data scaling laws, or datamix optimization

Lead complex technical projects end-to-end

Qualifications

Minimum

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

1+ year of industry research experience in LLM/NLP or related AI/ML models

Experience owning and/or driving complex technical projects from end-to-end

Practical experience with pre-training or mid-training data curation for large foundational models and experience working with organic, synthetic, agentic, or reasoning data for LLMs

Demonstrated data infrastructure and software background, and experience building data tooling and services

Published research in leading peer-reviewed conferences (e.g., NeurIPS, ICML, ICLR, ACL, EMNLP) and/or demonstrated significant industry influence in the field of AI

Preferred

Experience working on frontier-quality/state-of-the-art Large Language Models

Masters degree or PhD in Computer Science or a related technical field

Hands-on experience with modeling frameworks like PyTorch

Hands-on experience on SQL and large-scale data handling, with familiarity of frameworks like Spark and Hive