QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

📅 2025-07-17

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Large language models (LLMs) suffer from low sample efficiency and poor performance on challenging multi-step mathematical reasoning tasks under reinforcement learning (RL). Method: We propose QuestA, a problem-augmentation framework that injects high-quality partial solutions into RL training and dynamically adjusts problem difficulty to reduce optimization complexity and provide dense supervisory signals. Contribution/Results: Theoretical analysis demonstrates that QuestA significantly improves sample efficiency. Empirical evaluation on a 1.5B-parameter model establishes new state-of-the-art results across multiple high-difficulty mathematical benchmarks: AIME24 (+5.3% absolute, 67.1%), AIME25 (+10.0%, 59.5%), and HMMT25 (+4.0%, 35.5%). QuestA overcomes fundamental convergence and generalization bottlenecks of conventional RL in complex reasoning, delivering a scalable framework for continual improvement of LLMs’ mathematical reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Reinforcement learning (RL) has become a key component in training large language reasoning models (LLMs). However, recent studies questions its effectiveness in improving multi-step reasoning-particularly on hard problems. To address this challenge, we propose a simple yet effective strategy via Question Augmentation: introduce partial solutions during training to reduce problem difficulty and provide more informative learning signals. Our method, QuestA, when applied during RL training on math reasoning tasks, not only improves pass@1 but also pass@k-particularly on problems where standard RL struggles to make progress. This enables continual improvement over strong open-source models such as DeepScaleR and OpenMath Nemotron, further enhancing their reasoning capabilities. We achieve new state-of-the-art results on math benchmarks using 1.5B-parameter models: 67.1% (+5.3%) on AIME24, 59.5% (+10.0%) on AIME25, and 35.5% (+4.0%) on HMMT25. Further, we provide theoretical explanations that QuestA improves sample efficiency, offering a practical and generalizable pathway for expanding reasoning capability through RL.

Problem

Research questions and friction points this paper is trying to address.

Improving multi-step reasoning in LLMs via question augmentation

Enhancing RL training effectiveness on hard math problems

Boosting sample efficiency for reasoning capability expansion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Question Augmentation reduces problem difficulty

Partial solutions enhance RL training signals

Improves sample efficiency in reasoning tasks

🔎 Similar Papers

CuriousLLM: Elevating Multi-Document Question Answering with LLM-Enhanced Knowledge Graph Reasoning