๐ค AI Summary
This work proposes a novel framework that integrates symbolic representations (via SymPy/SMT) with a closed-loop adaptive mechanism to address the limitations of existing reinforcement learning approaches in generating mathematical training data. Current methods often lack adaptability to the modelโs evolving capabilities and offer insufficient control over problem structure. The proposed approach models problems in a symbolic space, ensuring structural controllability and verifiable solutions, while dynamically adjusting problem difficulty to match the learnerโs current proficiency. By decoupling mathematical reasoning from linguistic expression, the framework enables strategy optimization through prompt-based learning within the symbolic space. Empirical results demonstrate that this method substantially enhances the mathematical problem-solving performance of small-scale open-source language models and yields training data with high diversity and precise structural control.
๐ Abstract
We present a method for generating training data for reinforcement learning with verifiable rewards to improve small open-weights language models on mathematical tasks. Existing data generation approaches rely on open-loop pipelines and fixed modifications that do not adapt to the model's capabilities. Furthermore, they typically operate directly on word problems, limiting control over problem structure. To address this, we perform modifications in a symbolic problem space, representing each problem as a set of symbolic variables and constraints (e.g., via algebraic frameworks such as SymPy or SMT formulations). This representation enables precise control over problem structure, automatic generation of ground-truth solutions, and decouples mathematical reasoning from linguistic realization. We also show that this results in more diverse generations. To adapt the problem difficulty to the model, we introduce a closed-loop framework that learns modification strategies through prompt optimization in symbolic space. Experimental results demonstrate that both adaptive problem generation and symbolic representation modifications contribute to improving the model's math solving ability.