Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement

📅 2026-01-04
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited complex reasoning capabilities of large language models in STEM domains by proposing a data-algorithm co-design paradigm. The authors construct a high-quality dataset comprising 10 million long-chain-of-thought samples and develop a five-stage data engine—encompassing annotation, deduplication, contamination removal, distillation, and stratified sampling—alongside a failure-driven post-training framework that effectively integrates open-source and synthetically generated data to optimize both supervised fine-tuning and reinforcement learning. Evaluated on an 8B-parameter model, the approach achieves an average performance gain of 4.68% over the strongest baseline on established STEM benchmarks. The project publicly releases both 8B and 32B models, together with 10M and 2.2M subsets of the curated dataset.

Technology Category

Application Category

📝 Abstract
We present Logics-STEM, a state-of-the-art reasoning model fine-tuned on Logics-STEM-SFT-Dataset, a high-quality and diverse dataset at 10M scale that represents one of the largest-scale open-source long chain-of-thought corpora. Logics-STEM targets reasoning tasks in the domains of Science, Technology, Engineering, and Mathematics (STEM), and exhibits exceptional performance on STEM-related benchmarks with an average improvement of 4.68% over the next-best model at 8B scale. We attribute the gains to our data-algorithm co-design engine, where they are jointly optimized to fit a gold-standard distribution behind reasoning. Data-wise, the Logics-STEM-SFT-Dataset is constructed from a meticulously designed data curation engine with 5 stages to ensure the quality, diversity, and scalability, including annotation, deduplication, decontamination, distillation, and stratified sampling. Algorithm-wise, our failure-driven post-training framework leverages targeted knowledge retrieval and data synthesis around model failure regions in the Supervised Fine-tuning (SFT) stage to effectively guide the second-stage SFT or the reinforcement learning (RL) for better fitting the target distribution. The superior empirical performance of Logics-STEM reveals the vast potential of combining large-scale open-source data with carefully designed synthetic data, underscoring the critical role of data-algorithm co-design in enhancing reasoning capabilities through post-training. We make both the Logics-STEM models (8B and 32B) and the Logics-STEM-SFT-Dataset (10M and downsampled 2.2M versions) publicly available to support future research in the open-source community.
Problem

Research questions and friction points this paper is trying to address.

reasoning
STEM
large language models
post-training
chain-of-thought
Innovation

Methods, ideas, or system contributions that make the work stand out.

failure-driven post-training
data-algorithm co-design
chain-of-thought reasoning
synthetic data synthesis
STEM reasoning
🔎 Similar Papers
No similar papers found.
Mingyu Xu
Mingyu Xu
Bytedance
large language modelmachine learning
C
Cheng Fang
Alibaba Group
Keyue Jiang
Keyue Jiang
University College London
Diffusion/Flow MdoelsGeometric Generative ModelsStatistical Machine Learning
Y
Yuqian Zheng
Alibaba Group, Georgia Institute of Technology
Y
Yanghua Xiao
Shanghai Key Laboratory of Data Science, Fudan University, College of Computer Science and Artificial Intelligence, Fudan University
B
Baojian Zhou
Shanghai Key Laboratory of Data Science, Fudan University, School of Data Science, Fudan University
Q
Qifang Zhao
Alibaba Group
S
Suhang Zheng
Alibaba Group
X
Xiuwen Zhu
Alibaba Group
J
Jiyang Tang
Alibaba Group, Nankai University
Y
Yongchi Zhao
Alibaba Group
Y
Yijia Luo
Alibaba Group
Z
Zhiqi Bai
Alibaba Group
Y
Yuchi Xu
Alibaba Group
W
Wenbo Su
Alibaba Group
Wei Wang
Wei Wang
Tongyi Lab, Alibaba Group
Generative Models
Bing Zhao
Bing Zhao
SRI International
Natural Language ProcessingMachine LearningOptimizations
L
Lin Qu
Alibaba Group
X
Xiaoxiao Xu
Alibaba Group