Research Engineer, Preparedness - Meta Superintelligence Labs

Meta
Menlo Park, CA

About the job

Meta is seeking Research Engineers to join the Preparedness team within Meta Superintelligence Labs. The Preparedness team evaluates the increasing capabilities of our AI systems, with a focus on frontier AI capabilities and risks. We ensure that evaluations are in place to mitigate these risks and responsibly handle the development of frontier AI.

Responsibilities

Build and continuously refine evaluations for multimodal and agentic frontier AI models, including in cybersecurity, chemical security, and biosecurity

Build robust, reusable evaluation pipelines that scale across multiple model lines and product areas

Produce auditable technical artifacts, including evaluation reports and model cards, at high reliability and speed

Scope and deliver end-to-end evaluations under ambiguous and rapidly shifting requirements, re-prioritizing as the threat landscape and Meta’s frontier models evolve

Work across research, engineering, policy, and legal teams to align evaluation priorities with launch timelines

Qualifications

Minimum

Bachelor's degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience

3+ years of experience in machine learning engineering, machine learning research, or a related technical role

Proficiency in Python and experience with ML frameworks

Experience identifying, designing and completing medium to large technical features independently, without guidance

Proven experience in software engineering practices including version control, testing, and code review practices

Preferred

Experience implementing or developing benchmarks for agentic large language models and multimodal models (e.g., vision-language, audio, video, browser agents)

Publications at peer-reviewed venues (NeurIPS, ICML, ICLR, ACL, EMNLP, or similar) related to language model evaluation, AI safety, or deep learning

Experience working with large-scale distributed systems and data pipelines

Experience in red-teaming AI systems, adversarial machine learning, or abuse prevention systems

Background in biology or chemistry, particularly chemical, biological, radiological, and nuclear (CBRN) risk domains and experience designing evaluations or threat assessments related to dual-use scientific knowledge

Background in cybersecurity, penetration testing, or security research, particularly as it relates to assessing AI-enabled cyber capabilities or designing mitigations for AI-assisted exploitation

Track record of open-source contributions to ML evaluation tools or benchmarks