Published 'MathArena: Evaluating LMs on Uncontaminated Math Competitions' on arXiv, 2025; 'Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad' also on arXiv, 2025; 'MathConstruct: Challenging LLM Reasoning with Constructive Proofs' accepted by ICML 2025; AgentDojo project won first prize in SafeBench competition (50,000 USD).
Research Experience
Currently a Senior Research Scientist at Google DeepMind; previously worked on securing AI agents at a startup; interned at Twitter, Facebook, and SigOpt.
Education
PhD from ETH Zurich, advisor unknown.
Background
Senior Research Scientist at Google DeepMind. His research interests include creating benchmarks like MathArena to evaluate LLMs on the latest math competitions. He received his PhD from ETH Zurich, where his work was featured in media such as Ars Technica, Forbes, and Wired. During high school, he won a gold medal at IMO and silver medals at IOI.
Miscellany
Released matharena.ai, a website for evaluating LLMs on latest math competitions.