Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement Learning

📅 2025-02-20

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the weak generalization and lack of reflective reasoning and verification capabilities of large language models (LLMs) on complex logical reasoning tasks—such as logic puzzles. Methodologically, it proposes a rule-driven reinforcement learning (RL) framework featuring: (1) a controllably synthesized logic puzzle dataset with automated ground-truth validation; (2) system prompts explicitly structuring chain-of-thought reasoning; (3) a sparse reward function based on output format compliance to discourage shortcut behaviors; and (4) a stable RL training paradigm. Trained on only 5K samples, a 7B-parameter LLM acquires high-level reasoning skills—including reflection, verification, and summarization—for the first time under few-shot settings. It significantly outperforms supervised fine-tuning baselines on high-difficulty mathematical benchmarks (e.g., AIME, AMC), achieving substantial gains in both reasoning stability and answer accuracy.

Technology Category

Application Category

📝 Abstract

Inspired by the success of DeepSeek-R1, we explore the potential of rule-based reinforcement learning (RL) in large reasoning models. To analyze reasoning dynamics, we use synthetic logic puzzles as training data due to their controllable complexity and straightforward answer verification. We make some key technical contributions that lead to effective and stable RL training: a system prompt that emphasizes the thinking and answering process, a stringent format reward function that penalizes outputs for taking shortcuts, and a straightforward training recipe that achieves stable convergence. Our 7B model develops advanced reasoning skills-such as reflection, verification, and summarization-that are absent from the logic corpus. Remarkably, after training on just 5K logic problems, it demonstrates generalization abilities to the challenging math benchmarks AIME and AMC.

Problem

Research questions and friction points this paper is trying to address.

Enhance reasoning in large language models

Use rule-based reinforcement learning

Improve generalization in math benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Rule-based reinforcement learning

Synthetic logic puzzles training

Stringent format reward function

🔎 Similar Papers

No similar papers found.