About the job
Meta is seeking Research Engineers to join the Evaluations team within Meta Superintelligence Labs. Evaluations are the core of AI progress at MSL, determining what capabilities get built, which features get prioritized, and how fast our models improve. As a Research Engineer on this team, you will curate and build the benchmarks for our most advanced AI models, across text, vision, audio, and beyond. You'll work alongside world-class researchers and engineers to collect, develop, and deploy novel benchmarks and reinforcement learning environments.
Responsibilities
Curate and integrate publicly available and internal benchmarks to direct the capabilities of frontier model development
Develop and implement evaluation environments, including environments for novel model capabilities and modalities
Collaborate with external data vendors to source and prepare high-quality evaluation datasets
Execute on the technical vision of research scientists designing new benchmarks and evaluations
Build robust, reusable evaluation pipelines that scale across multiple model lines and product areas
Contribute to evaluation tooling that measures the quality and reliability of evaluation suites
Qualifications
Minimum
Currently has, or is in the process of obtaining a Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta
Bachelor's or Master's degree in Computer Science, Machine Learning, or a related technical field
1+ years of experience in machine learning engineering, machine learning research, or a related technical role
Proficiency in Python and experience with ML frameworks such as PyTorch
Experience identifying, designing and completing medium to large technical features independently, without guidance
Demonstrated experience in software engineering practices including version control, testing, and code review practices
Preferred
Publications at peer-reviewed venues (NeurIPS, ICML, ICLR, ACL, EMNLP, or similar) related to language model evaluation, benchmarking, or deep learning
Hands-on experience with language model post-training and deep learning systems, or building reinforcement learning environments
Experience implementing or developing evaluation benchmarks for large language models and multimodal models (e.g., vision-language, audio, video)
Experience working with large-scale distributed systems and data pipelines
Familiarity with language model evaluation frameworks and metrics
Track record of open-source contributions to ML evaluation tools or benchmarks