JT-Math: A Multi-Stage Framework for Advanced Mathematical Reasoning in Large Language Models

📅 2025-07-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) still face significant limitations in complex mathematical reasoning—particularly tasks requiring multi-step deduction and deep conceptual understanding. Method: We propose a multi-stage optimization framework to develop an open-source mathematical reasoning model series, integrating base model pretraining, instruction tuning, and chain-of-thought (CoT) modeling. Specifically: (i) we design a progressive reinforcement learning curriculum leveraging the GRPO algorithm with native 32K-context support; (ii) we conduct staged supervised fine-tuning and CoT-augmented training on a high-quality 210B-token corpus. Contribution/Results: Our model achieves state-of-the-art performance among open-source models of comparable scale on competition-level mathematics benchmarks (e.g., MATH, AMC), outperforming both O1-mini and GPT-4o. To our knowledge, this is the first open-source system demonstrating scalable, robust, and deep mathematical reasoning capability.

Technology Category

Application Category

📝 Abstract

Mathematical reasoning is a cornerstone of artificial general intelligence and a primary benchmark for evaluating the capabilities of Large Language Models (LLMs). While state-of-the-art models show promise, they often falter when faced with complex problems that demand deep conceptual understanding and intricate, multi-step deliberation. To address this challenge, we introduce JT-Math-8B, a series of open-source models comprising base, instruct, and thinking versions, built upon a systematic, multi-stage optimization framework. Our pre-training corpus is a high-quality, 210B-token dataset curated through a dedicated data pipeline that uses model-based validation to ensure quality and diversity. The Instruct Model is optimized for direct, concise answers through Supervised Fine-Tuning (SFT) and a GRPO-based reinforcement learning (RL) method. The Thinking Model is trained for complex problem-solving using a Long Chain-of-Thought (Long CoT) approach, combining SFT with a novel, multi-stage RL curriculum that progressively increases task difficulty and context length up to 32K tokens. JT-Math-8B achieves state-of-the-art results among open-source models of similar size, surpassing prominent models like OpenAI's O1-mini and GPT-4o , and demonstrating superior performance on competition-level mathematics.

Problem

Research questions and friction points this paper is trying to address.

Enhancing mathematical reasoning in large language models

Addressing complex problems requiring multi-step deliberation

Improving performance on competition-level mathematics tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage optimization framework for LLMs

GRPO-based RL method for concise answers

Long CoT with multi-stage RL curriculum

🔎 Similar Papers

No similar papers found.

Authors to Follow