Magistral 1.2 achieves frontier performance on reasoning and coding benchmarks; Magistral is the first reasoning model by Mistral; LlamaRL is the first large-scale RL stack internal to Llama research; new work on scaling RL to unverifiable domains such as long-form data; Llama 4 is the first major Llama release trained with a large-scale RL stack; led the RL stack development for Llama 3.3; part of the project to benchmark scalable-oversight protocols; investigated the importance of on-policy sampling in language model alignment; four papers accepted at ICML 2024; contributed to the development and technical reports of Gemini 1.5 and Gemini projects.
Research Experience
Worked on reasoning at Mistral; was part of the Llama research team, spearheading the prototype and algorithmic recipes for online RL, and scaling the training to Llama 3.3-4, also worked on post-training for reasoning; core contributor to Gemini v1-1.5 post-training focusing on tool use and agent at DeepMind London; researched various aspects of deep RL algorithms and systems.
Education
PhD from Columbia University in New York City. Twice interned at DeepMind Paris, hosted by Remi Munos.
Background
A researcher interested in reinforcement learning. Currently, a member of the technical staff on the pre-training team at Anthropic.
Miscellany
Besides building industry-grade models, also spent limited time doing open science.