Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
Grok-Code-Fast-1 (2025): xAI's first agentic coding model, on the latency-performance Pareto frontier
Grok-4 (2025): world's most intelligent LLM using pre-training scale reinforcement learning
Search Arena (Preprint 2025): introduced a large-scale human-preference dataset with 24,000+ multi-turn interactions with search-augmented LLMs
SkyRL-v0 (2025): achieved strong results on SWE-Bench-Verified using a novel RL training pipeline
Learning Adaptive Parallel Reasoning (COLM 2025): proposed APR framework enabling dynamic orchestration of parallel/serial computation; achieved SOTA (e.g., 83.4% vs. 60.0% on Countdown)
TinyZero (2025 open-source project): first small-scale reproduction of reasoning models; 3B base LM with self-verification and search abilities; >10K GitHub stars; featured in CNBC, The Independent, etc.
Contributed challenging ML questions to Humanity's Last Exam (Preprint 2025)
Training Software Engineering Agents and Verifiers with SWE-Gym (ICML 2025): >5M downloads