Logics-STEM: Empowering LLM Reasoning via Failure-Driven Post-Training and Document Knowledge Enhancement

📅 2026-01-04

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the limited complex reasoning capabilities of large language models in STEM domains by proposing a data-algorithm co-design paradigm. The authors construct a high-quality dataset comprising 10 million long-chain-of-thought samples and develop a five-stage data engine—encompassing annotation, deduplication, contamination removal, distillation, and stratified sampling—alongside a failure-driven post-training framework that effectively integrates open-source and synthetically generated data to optimize both supervised fine-tuning and reinforcement learning. Evaluated on an 8B-parameter model, the approach achieves an average performance gain of 4.68% over the strongest baseline on established STEM benchmarks. The project publicly releases both 8B and 32B models, together with 10M and 2.2M subsets of the curated dataset.

Technology Category

Application Category

📝 Abstract

We present Logics-STEM, a state-of-the-art reasoning model fine-tuned on Logics-STEM-SFT-Dataset, a high-quality and diverse dataset at 10M scale that represents one of the largest-scale open-source long chain-of-thought corpora. Logics-STEM targets reasoning tasks in the domains of Science, Technology, Engineering, and Mathematics (STEM), and exhibits exceptional performance on STEM-related benchmarks with an average improvement of 4.68% over the next-best model at 8B scale. We attribute the gains to our data-algorithm co-design engine, where they are jointly optimized to fit a gold-standard distribution behind reasoning. Data-wise, the Logics-STEM-SFT-Dataset is constructed from a meticulously designed data curation engine with 5 stages to ensure the quality, diversity, and scalability, including annotation, deduplication, decontamination, distillation, and stratified sampling. Algorithm-wise, our failure-driven post-training framework leverages targeted knowledge retrieval and data synthesis around model failure regions in the Supervised Fine-tuning (SFT) stage to effectively guide the second-stage SFT or the reinforcement learning (RL) for better fitting the target distribution. The superior empirical performance of Logics-STEM reveals the vast potential of combining large-scale open-source data with carefully designed synthetic data, underscoring the critical role of data-algorithm co-design in enhancing reasoning capabilities through post-training. We make both the Logics-STEM models (8B and 32B) and the Logics-STEM-SFT-Dataset (10M and downsampled 2.2M versions) publicly available to support future research in the open-source community.

Problem

Research questions and friction points this paper is trying to address.

reasoning

STEM

large language models

post-training

chain-of-thought

Innovation

Methods, ideas, or system contributions that make the work stand out.

failure-driven post-training

data-algorithm co-design

chain-of-thought reasoning