[05/2025] Released Laser, improving AIME24 score by +6.1 while reducing token usage by 63%.
[05/2025] Three papers accepted by ICML 2025.
[01/2025] Announced O/R-1 Style Model and SimpleRL-Reason for scalable reinforcement learning in LLM reasoning.
[11/2024] Launched M-STAR (Multimodal Self-Evolving Training for Reasoning) project.
[07/2024] Introduced MathCheck, a checklist-based evaluation framework for mathematical reasoning in LLMs.
[01/2024] Paper 'What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning' accepted by ICLR 2024.
[12/2023] Released Deita project: Deita-7B achieved 7.55 on MT-Bench, 90.06% on AlpacaEval, and 69.86 on OpenLLM Benchmark using only 6K SFT samples and 10K preference data.
Published multiple preprints and conference papers, including several first-author or collaborative works at ICML 2025 on efficient reasoning, multimodal reasoning, hierarchical RL, and model merging.