VERIFY-RL: Verifiable Recursive Decomposition for Reinforcement Learning in Mathematical Reasoning

📅 2026-02-07

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work addresses the limitations of existing problem decomposition methods in mathematical reasoning, which often rely on heuristic strategies without rigorous guarantees regarding the simplicity, validity, or mathematical soundness of subproblems. The authors propose a verifiable recursive decomposition framework grounded in symbolic differentiation, enforcing three formal conditions at each decomposition step: strictly decreasing structural complexity, solution containment, and derivability via formal rules. This framework introduces, for the first time, automatically verifiable decomposition criteria that enable “verification by construction,” thereby eliminating invalid decompositions at their source. By integrating symbolic computation with reinforcement learning and curriculum learning, and leveraging formal calculus rules for decomposition, the approach achieves a significant performance gain—boosting accuracy from 32% to 68% on the most challenging mathematical problems, representing a 40% relative improvement in overall performance.

Technology Category

Application Category

📝 Abstract

Training language models to solve complex mathematical problems benefits from curriculum learning progressively training on simpler subproblems. However, existing decomposition methods are often heuristic, offering no guarantees that subproblems are simpler, that solving them aids the parent task, or that their relationships are mathematically grounded. We observe that symbolic differentiation provides a natural structure for verified decomposition: calculus rules explicitly define how expressions reduce to simpler components with provable properties. We introduce Verify-RL, a framework where every parent-child decomposition satisfies three verifiable conditions: strictly decreasing structural complexity, solution containment, and formal rule derivation. Unlike heuristic methods where a significant fraction of decompositions are invalid our properties admit automatic verification through symbolic computation, achieving"verification by construction"Experiments demonstrate that eliminating invalid decompositions yields sizable gains, accuracy on the hardest problems more than doubles from 32% to 68%, with a 40% relative improvement overall.

Problem

Research questions and friction points this paper is trying to address.

mathematical reasoning

problem decomposition

verifiability

reinforcement learning

symbolic computation

Innovation

Methods, ideas, or system contributions that make the work stand out.

verifiable decomposition

symbolic differentiation

reinforcement learning