🤖 AI Summary
Large multimodal models (LMMs) suffer from limited geometric reasoning performance due to scarcity, low diversity, and insufficient precision of chain-of-thought (CoT) vision-language data. Method: This paper proposes TR-CoT, a theorem-verification-driven reverse CoT generation framework. It introduces a novel theorem-guided reverse reasoning paradigm, integrating formal geometric theorem modeling, structured graph description generation, multi-granularity attribute-text alignment verification, and bidirectional cross-validation to detect logical fallacies and enhance reasoning consistency. Contribution/Results: Experiments show that TR-CoT achieves absolute gains of +10.1% on MathVista and +4.7% on GeoQA over strong baselines; logical consistency improves by 24.5%; theorem coverage breadth expands significantly; and TR-CoT outperforms advanced closed-source models including GPT-4o.
📝 Abstract
Large Multimodal Models (LMMs) face limitations in geometric reasoning due to insufficient Chain of Thought (CoT) image-text training data. While existing approaches leverage template-based or LLM-assisted methods for geometric CoT data creation, they often face challenges in achieving both diversity and precision. To bridge this gap, we introduce a two-stage Theorem-Validated Reverse Chain-of-Thought Reasoning Synthesis (TR-CoT) framework. The first stage, TR-Engine, synthesizes theorem-grounded geometric diagrams with structured descriptions and properties. The second stage, TR-Reasoner, employs reverse reasoning to iteratively refine question-answer pairs by cross-validating geometric properties and description fragments. Our approach expands theorem-type coverage, corrects long-standing misunderstandings, and enhances geometric reasoning. Fine-grained CoT improves theorem understanding and increases logical consistency by 24.5%. Our best models surpass the baselines in MathVista and GeoQA by 10.1% and 4.7%, outperforming advanced closed-source models like GPT-4o.