🤖 AI Summary
This study investigates the intrinsic mechanisms underlying cross-domain generalization of large reasoning models (LRMs) during long-chain-of-thought (Long CoT) training, positing that “abstract reasoning prototypes” constitute the foundational basis of their general reasoning capability. Methodologically, we introduce the first dual-prototype space grounded in formal logic (Prolog) and automated planning (PDDL), enabling automatic natural-language-to-prototype mapping, interpreter-based closed-loop verification, and synthesis of arbitrarily correct problems within the prototype space. Our key contributions are: (1) establishing reasoning prototypes as verifiable and scalable generalization carriers; and (2) proposing prototype-space modeling and chain-of-thought distillation. Experiments demonstrate improvements of +4.7% (logical reasoning, Enigmata-Eval), +6.3% (planning), +4.0% (general reasoning, MMLU), and +1.0% (mathematical reasoning, AIME24). Ablation studies confirm that prototype-space learning significantly enhances generalization on structurally similar tasks.
📝 Abstract
Recent advances in Large Reasoning Models (LRMs) trained with Long Chain-of-Thought (Long CoT) reasoning have demonstrated remarkable cross-domain generalization capabilities. However, the underlying mechanisms supporting such transfer remain poorly understood. We hypothesize that cross-domain generalization arises from shared abstract reasoning prototypes -- fundamental reasoning patterns that capture the essence of problems across domains. These prototypes minimize the nuances of the representation, revealing that seemingly diverse tasks are grounded in shared reasoning structures.Based on this hypothesis, we propose ProtoReasoning, a framework that enhances the reasoning ability of LLMs by leveraging scalable and verifiable prototypical representations (Prolog for logical reasoning, PDDL for planning).ProtoReasoning features: (1) an automated prototype construction pipeline that transforms problems into corresponding prototype representations; (2) a comprehensive verification system providing reliable feedback through Prolog/PDDL interpreters; (3) the scalability to synthesize problems arbitrarily within prototype space while ensuring correctness. Extensive experiments show that ProtoReasoning achieves 4.7% improvement over baseline models on logical reasoning (Enigmata-Eval), 6.3% improvement on planning tasks, 4.0% improvement on general reasoning (MMLU) and 1.0% on mathematics (AIME24). Significantly, our ablation studies confirm that learning in prototype space also demonstrates enhanced generalization to structurally similar problems compared to training solely on natural language representations, validating our hypothesis that reasoning prototypes serve as the foundation for generalizable reasoning in large language models.