Scholar

Yunhao Tang

Google Scholar ID: j4VWFL4AAAAJ

Member of technical staff @ Anthropic

Reinforcement Learning

Citations & Impact

All-time

Citations

11,013

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

9 items

2026

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

Resume (English only)

Academic Achievements

Magistral 1.2 achieves frontier performance on reasoning and coding benchmarks; Magistral is the first reasoning model by Mistral; LlamaRL is the first large-scale RL stack internal to Llama research; new work on scaling RL to unverifiable domains such as long-form data; Llama 4 is the first major Llama release trained with a large-scale RL stack; led the RL stack development for Llama 3.3; part of the project to benchmark scalable-oversight protocols; investigated the importance of on-policy sampling in language model alignment; four papers accepted at ICML 2024; contributed to the development and technical reports of Gemini 1.5 and Gemini projects.

Research Experience

Worked on reasoning at Mistral; was part of the Llama research team, spearheading the prototype and algorithmic recipes for online RL, and scaling the training to Llama 3.3-4, also worked on post-training for reasoning; core contributor to Gemini v1-1.5 post-training focusing on tool use and agent at DeepMind London; researched various aspects of deep RL algorithms and systems.

Education