Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning

📅 2025-04-10

📈 Citations: 1

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Large language models (LLMs) often exhibit limited reasoning capabilities and poor cross-domain generalization on STEM, programming, and general-purpose tasks. Method: We propose a lightweight sparse Mixture-of-Experts (MoE) architecture (20B activated parameters / 200B total parameters), integrated with reinforcement learning–driven chain-of-thought optimization, multi-stage thought distillation, and alignment training to enable “think-before-answer” reasoning. Contribution/Results: We introduce and open-source two high-quality, human-curated benchmarks—BeyondAIME (focused on advanced mathematical reasoning) and Codeforces (targeting competitive algorithmic programming). Experimental results show state-of-the-art performance: 86.7 on AIME 2024, 55.0 on Codeforces, and 77.3 on GPQA. Moreover, the model achieves an 8% win rate improvement over DeepSeek-R1 on non-reasoning tasks, demonstrating significantly enhanced reasoning accuracy and consistent cross-domain generalization.

Technology Category

Application Category

📝 Abstract

We introduce Seed-Thinking-v1.5, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed-Thinking-v1.5 achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed-Thinking-v1.5 is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research.

Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning models with reinforcement learning for better performance.

Demonstrating superior reasoning in STEM and coding benchmarks.

Achieving broader applicability beyond reasoning tasks with MoE architecture.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning enhances reasoning performance

Mixture-of-Experts model with scalable parameters

Generalizes well across STEM and coding tasks

🔎 Similar Papers

No similar papers found.