Seed-Thinking-v1.5: Advancing Superb Reasoning Models with Reinforcement Learning

📅 2025-04-10
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) often exhibit limited reasoning capabilities and poor cross-domain generalization on STEM, programming, and general-purpose tasks. Method: We propose a lightweight sparse Mixture-of-Experts (MoE) architecture (20B activated parameters / 200B total parameters), integrated with reinforcement learning–driven chain-of-thought optimization, multi-stage thought distillation, and alignment training to enable “think-before-answer” reasoning. Contribution/Results: We introduce and open-source two high-quality, human-curated benchmarks—BeyondAIME (focused on advanced mathematical reasoning) and Codeforces (targeting competitive algorithmic programming). Experimental results show state-of-the-art performance: 86.7 on AIME 2024, 55.0 on Codeforces, and 77.3 on GPQA. Moreover, the model achieves an 8% win rate improvement over DeepSeek-R1 on non-reasoning tasks, demonstrating significantly enhanced reasoning accuracy and consistent cross-domain generalization.

Technology Category

Application Category

📝 Abstract
We introduce Seed-Thinking-v1.5, capable of reasoning through thinking before responding, resulting in improved performance on a wide range of benchmarks. Seed-Thinking-v1.5 achieves 86.7 on AIME 2024, 55.0 on Codeforces and 77.3 on GPQA, demonstrating excellent reasoning abilities in STEM and coding. Beyond reasoning tasks, the method demonstrates notable generalization across diverse domains. For instance, it surpasses DeepSeek R1 by 8% in win rate on non-reasoning tasks, indicating its broader applicability. Compared to other state-of-the-art reasoning models, Seed-Thinking-v1.5 is a Mixture-of-Experts (MoE) model with a relatively small size, featuring 20B activated and 200B total parameters. As part of our effort to assess generalized reasoning, we develop two internal benchmarks, BeyondAIME and Codeforces, both of which will be publicly released to support future research.
Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning models with reinforcement learning for better performance.
Demonstrating superior reasoning in STEM and coding benchmarks.
Achieving broader applicability beyond reasoning tasks with MoE architecture.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning enhances reasoning performance
Mixture-of-Experts model with scalable parameters
Generalizes well across STEM and coding tasks
🔎 Similar Papers
No similar papers found.
Jiaze Chen
Jiaze Chen
Bytedance
Natural Language Processing
Tiantian Fan
Tiantian Fan
Bytedance
machine learning
X
Xin Liu
L
Lingjun Liu
Zhiqi Lin
Zhiqi Lin
Bytedance Inc.
Distributed AI Systems for Large Models
M
Mingxuan Wang
Chengyi Wang
Chengyi Wang
Bytedance Inc
Large language model
Xiangpeng Wei
Xiangpeng Wei
Bytedance Seed
LLMsReasoning modelsAgents
Wenyuan Xu
Wenyuan Xu
Professor, IEEE Fellow, Zhejiang University, College of EE
Wireless Network SecurityEmbedded System SecurityAnalog Cyber SecurityIoT Security
Y
Yufeng Yuan
Y
Yu Yue
L
Lin Yan
Qiying Yu
Qiying Yu
Tsinghua University
Multimodal LearningSelf-supervised LearningLarge Models
X
Xiaochen Zuo
C
Chi Zhang
R
Ruofei Zhu
Z
Zhecheng An
Zhihao Bai
Zhihao Bai
Y
Yu Bao
X
Xingyan Bin
Jiangjie Chen
Jiangjie Chen
ByteDance Seed
NLPMachine ReasoningLarge Language ModelsAutonomous Agent
F
Feng Chen
H
Hongmin Chen
R
Riwei Chen
L
Liangqiang Chen
Zixin Chen
Zixin Chen
HKUST VisLab
Human-AI CollaborationVisual AnalyticsLLM for Education
Jinsong Chen
Jinsong Chen
Central China Normal University
Graph Representation LearningGraph Data MiningAI for Education
S
Siyan Chen
Kaiyuan Chen
Kaiyuan Chen
Bytedance
LLMScaling LawAI4WeatherVideo Generation
Z
Zhi Chen
J
Jin Chen
Jiecao Chen
Jiecao Chen
Bytedance Seed
LLMreasoningagenttool usememory
J
Jinxin Chi
Weinan Dai
Weinan Dai
Tsinghua University
Artificial IntelligenceLarge Language ModelsReinforcement Learning
Ning Dai
Ning Dai
Oregon State University
Large Language ModelsNatural Language ProcessingComputational Biology
J
Jiahui Dai
Shihan Dou
Shihan Dou
Fudan University
LLMsCode LMsRLAlignment
Y
Yantao Du
Zhengyin Du
Zhengyin Du
ByteDance Seed
Large Language ModelMulti-modal Learning
J
Jianhui Duan
C
Chen Dun
T
Ting-Han Fan
Jiazhan Feng
Jiazhan Feng
University of Oxford; PhD at Peking University
Natural Language ProcessingLarge Language ModelsMultimodal Agent
Junda Feng
Junda Feng
Unknown affiliation
Z
Ziyuan Feng
Yuwei Fu
Yuwei Fu
McGill Univeristy
reinforcement learning
W
Wenqi Fu
H
Hanjie Fu
H
Hao Ge
Hongyi Guo
Hongyi Guo
Northwestern University
Large Language ModelReinforcement Learning
Mingji Han
Mingji Han
Tencent
databasenetworkingsystem
L
Li Han
W
Wenhao Hao
X
Xintong Hao
Qianyu He
Qianyu He
Fudan University
Large Language ModelReasoningInstruction FollowingCreative Generation
J
Jerry He
F
Feng He
W
Wen Heng
Z
Zehua Hong
Q
Qingli Hou
L
Liang Hu
Shengding Hu
Shengding Hu
Tsinghua University
LLMArtificial Super Intelligence
N
Nan Hu
K
Kai Hua
Q
Qi Huang
Ziyue Huang
Ziyue Huang
The Hong Kong University of Science and Technology
H
Hongzhi Huang
Z
Zihao Huang
T
Ting Huang
W
Wenhao Huang
W
Wei Jia
Bin Jia
Bin Jia
X
Xiaoying Jia
Yuhua Jiang
Yuhua Jiang
Tsinghua University
reinforcement learning
Haobin Jiang
Haobin Jiang
PhD student, Peking University
reinforcement learningmultimodal model
Ziheng Jiang
Ziheng Jiang
Research Scientist, ByteDance
SystemsMachine Learning
K
Kaihua Jiang
C
Chengquan Jiang
J
Jianpeng Jiao
X
Xiaoran Jin
Xing Jin
Xing Jin
Phd Candidate of Computer Science, Syracuse University
Mobile SecurityWeb SecurityData Mining
Xunhao Lai
Xunhao Lai
Peking University
Machine LearningNatural Language ProcessingLarge language model
Z
Zheng Li
X
Xiang Li
Liyi Li
Liyi Li
Sr. Materials and Failure Analysis Engineer, Intel Corporation
MicrofabricationMicroelectronic PackagingMicroelectronic Materials and Failure Analysis
H
Hongkai Li
Shengxian Wan
Shengxian Wan
Y
Ya Wang
Yunshui Li
Yunshui Li
Seed Team | Prev. Qwen, SIAT
Natural Language ProcessingMultimodal (Vision-and-Language) Representation Learning
C
Chenggang Li
N
Niuniu Li
Siyu Li
Siyu Li
University of Illinois at Chicago
RoboticsMicro-robot swarmsHuman-robot InteractionControl and Motion Planning
X
Xi Li
X
Xiao Li
A
Aoyan Li
Yuntao Li
Yuntao Li
Peking University
N
Nianning Liang
Xinnian Liang
Xinnian Liang
Bytedance Inc.
Large Language Model
Haibin Lin
Haibin Lin
Bytedance
Machine Learning SystemsNatural Language Processing
Weijian Lin
Weijian Lin
Carnegie Mellon University
Machine LearningRecommendation SystemsLarge Language Model
Y
Ye Lin
Z
Zhicheng Liu
Guanlin Liu
Guanlin Liu
ByteDance
Language ModelReinforcement LearningMachine learningStatistics
Chenxiao Liu
Chenxiao Liu
Peking University
Y
Yan Liu
G
Gaohong Liu
J
Juncai Liu
C
Chundian Liu
D
Deyi Liu
K
Kaibo Liu
S
Siyao Liu
Q
Qi Liu
Yongfei Liu
Yongfei Liu
K
Kang Liu
G
Gan Liu
Boyi Liu
Boyi Liu
Snowflake AI Research
Reinforcement LearningLLMAI Agent
R
Rui Long
W
Weiqiang Lou
Chenwei Lou
Chenwei Lou
Harbin Institute of Technology
NLP
Xiang Luo
Xiang Luo
Nanjing University
Natural Language ProcessingTask-Oriented Dialogue
Y
Yao Luo
C
Caiping Lv
H
Heyang Lv
B
Bole Ma
Q
Qianli Ma
H
Hongzhi Ma
Yiyuan Ma
Yiyuan Ma
Bytedance Seed
J
Jin Ma
W
Wenchang Ma
Tingting Ma
Tingting Ma
Bytedance Inc.
Large Language Model
C
Chen Mao
Q
Qi Min
Z
Zhenwu Nan
G
GU Ning
J
Jinxiang Ou
Haojie Pan
Haojie Pan
R
Renming Pang
Yanghua Peng
Yanghua Peng
ByteDance Inc.
Large Language ModelsMachine Learning SystemsGPU Scheduling
Tao Peng
Tao Peng
吉林大学
natural language processingknowledge graph
L
Lihua Qian
M
Mu Qiao
Meng Qu
Meng Qu
C
Cheng Ren
H
Hongbin Ren
Y
Yong Shan
W
Wei Shen
K
Ke Shen
Kai Shen
Kai Shen
Associate Professor of Computer Science, University of Rochester
Computer Systems
Guangming Sheng
Guangming Sheng
the University of Hong Kong
J
Jinlong Shi
Wenlei Shi
Wenlei Shi
Microsoft Research Asia
reinforcement learningmachine learning
G
Guang Shi
S
Shuai Shuai Cao
Yuxin Song
Yuxin Song
Baidu
Computer VisionVision-Language ModelGenerative ModelVideo Understanding
Zuquan Song
Zuquan Song
Bytedance
J
Jing Su
Y
Yifan Sun
T
Tao Sun
Zewei Sun
Zewei Sun
ByteDance LLM Team (Seed)
Large Language ModelNatural Language ProcessingDeep Learning
Borui Wan
Borui Wan
The University of Hong Kong
Large Language ModelComputer Systems
Z
Zihan Wang
X
Xiaohui Wang
X
Xi Wang
S
Shuguang Wang
J
Jun Wang
Q
Qinlong Wang
C
Chenyuan Wang
S
Shuai Wang
C
Changbao Wang
J
Jiaqiang Wang
Shihang Wang
Shihang Wang
DAMO Academy, Alibaba Inc.
Natural Language Processing
Xuwu Wang
Xuwu Wang
ByteDance
Zaiyuan Wang
Zaiyuan Wang
ByteDance
AILLMFunction CallAgent
Y
Yuxuan Wang
W
Wenqi Wang
T
Taiqing Wang
C
Chengzhi Wei
H
Houmin Wei
Z
Ziyun Wei
S
Shufa Wei
Z
Zheng Wu
Y
Yong-Xu Wu
Y
Yangjun Wu
Bohong Wu
Bohong Wu
Shanghai Jiao Tong University
Shuangzhi Wu
Shuangzhi Wu
Bytedance
Machine TranslationDeep LearningNatural Language Processing
J
Jingqiao Wu
N
Ning Wu
Shuangzhi Wu
Shuangzhi Wu
Bytedance
Machine TranslationDeep LearningNatural Language Processing
J
Jianmin Wu
Chenguang Xi
Chenguang Xi
Machine Learning Engineer, ByteDance
Code LLM
F
Fan Xia
Yuqiao Xian
Yuqiao Xian
ByteDance Seed
language modeldeep learningcomputer vision
L
Liang Xiang
B
Boren Xiang
B
Bowen Xiao
Zhen Xiao
Zhen Xiao
Peking University
distributed systemscloud computingmachine learning
Xia Xiao
Xia Xiao
Bytedance - Seed Code
Y
Yongsheng Xiao
C
Chao Xin
S
Shulin Xin
Yuwen Xiong
Yuwen Xiong
University of Toronto
Computer VisionDeep Learning
J
Jingjing Xu
Z
Ziwen Xu
C
Chenyin Xu
J
Jiayi Xu
Y
Yifan Xu
W
Wei Xu
Y
Yufei Xu
S
Shikun Xu
Shipeng Yan
Shipeng Yan
S
Shen Yan
Q
Qingping Yang
X
Xi Yang
T
Tianhao Yang
Y
Yuehang Yang
Y
Yuan Yang
X
Ximing Yang
Z
Zeyu Yang
G
Guang Yang
Y
Yifan Yang
Xuesong Yao
Xuesong Yao
Master of Mechanics, Peking University
Machine LearningLarge language model
Bairen Yi
Bairen Yi
Unknown affiliation
Computer Systems
Fan Yin
Fan Yin
Research Scientist at Google Deepmind
Natural Language ProcessingMachine Learning
J
Jianian Yin
Z
Ziqiang Ying
X
Xiangyu Yu
H
Hongli Yu
S
Song Yu
Menghan Yu
Menghan Yu
ByteDance
Machine Learning
H
Huan Yu
S
Siyu Yuan
J
Junhong Yuan
Y
Yutao Zeng
T
Tianyang Zhan
Z
Zheng Zhang
Y
Yun Zhang
M
Mofan Zhang
Wang Zhang
Wang Zhang
Tianjin University
Graph Representation Learning
R
Ru Zhang
Z
Zhi Zhang
Tianqi Zhang
Tianqi Zhang
Oak Ridge National Laboratory
remote sensingforest structurewildfire riskclimate modelingmachine learning
X
Xinyi Zhang
Z
Zhexi Zhang
S
Sijun Zhang
W
Wenqiang Zhang
X
Xiangxiang Zhang
Y
Yongtao Zhang
Yuyu Zhang
Yuyu Zhang
Research Scientist, ByteDance
Machine Learning
G
Ge Zhang
H
He Zhang
Y
Yue Zhang
Renjie Zheng
Renjie Zheng
ByteDance
AINLPMachine Translation
Ningxin Zheng
Ningxin Zheng
Bytedance AML
Z
Zhuolin Zheng
Yaowei Zheng
Yaowei Zheng
Ph.D. student, Beihang University
Machine LearningNatural Language Processing
Chen Zheng
Chen Zheng
Bytedance Inc.
Deep LearningNatural Language ProcessingLarge Language Model
X
Xiaoyun Zhi
Wanjun Zhong
Wanjun Zhong
Bytedance Seed Research
NLP
C
Cheng Zhong
Z
Zheng Zhong
B
Baoquan Zhong
Xun Zhou
Xun Zhou
Professor of Computer Science, Harbin Institute of Technology, Shenzhen (HIT-SZ)
Big data analyticsSpatial databaseSpatial Data MiningGISmachine learning
N
Na Zhou
Huan Zhou
Huan Zhou
Northwestern Polytechnical University
Mobile Edge ComputingFederated LearningMobile Social NetworksVANETsData Offloading
Hang Zhu
Hang Zhu
Johns Hopkins University
Computer SystemsMachine Learning SystemsCloud Computing
Defa Zhu
Defa Zhu
ByteDance
AGI
W
Wenjia Zhu
Lei Zuo
Lei Zuo
Bytedance
Natural Language Processing