🤖 AI Summary
Large language models (LLMs) still face significant limitations in complex mathematical reasoning—particularly tasks requiring multi-step deduction and deep conceptual understanding.
Method: We propose a multi-stage optimization framework to develop an open-source mathematical reasoning model series, integrating base model pretraining, instruction tuning, and chain-of-thought (CoT) modeling. Specifically: (i) we design a progressive reinforcement learning curriculum leveraging the GRPO algorithm with native 32K-context support; (ii) we conduct staged supervised fine-tuning and CoT-augmented training on a high-quality 210B-token corpus.
Contribution/Results: Our model achieves state-of-the-art performance among open-source models of comparable scale on competition-level mathematics benchmarks (e.g., MATH, AMC), outperforming both O1-mini and GPT-4o. To our knowledge, this is the first open-source system demonstrating scalable, robust, and deep mathematical reasoning capability.
📝 Abstract
Mathematical reasoning is a cornerstone of artificial general intelligence and a primary benchmark for evaluating the capabilities of Large Language Models (LLMs). While state-of-the-art models show promise, they often falter when faced with complex problems that demand deep conceptual understanding and intricate, multi-step deliberation. To address this challenge, we introduce JT-Math-8B, a series of open-source models comprising base, instruct, and thinking versions, built upon a systematic, multi-stage optimization framework. Our pre-training corpus is a high-quality, 210B-token dataset curated through a dedicated data pipeline that uses model-based validation to ensure quality and diversity. The Instruct Model is optimized for direct, concise answers through Supervised Fine-Tuning (SFT) and a GRPO-based reinforcement learning (RL) method. The Thinking Model is trained for complex problem-solving using a Long Chain-of-Thought (Long CoT) approach, combining SFT with a novel, multi-stage RL curriculum that progressively increases task difficulty and context length up to 32K tokens. JT-Math-8B achieves state-of-the-art results among open-source models of similar size, surpassing prominent models like OpenAI's O1-mini and GPT-4o , and demonstrating superior performance on competition-level mathematics.