Theorem-Validated Reverse Chain-of-Thought Problem Generation for Geometric Reasoning

📅 2024-10-23
📈 Citations: 13
Influential: 3
📄 PDF
🤖 AI Summary
Large multimodal models (LMMs) suffer from limited geometric reasoning performance due to scarcity, low diversity, and insufficient precision of chain-of-thought (CoT) vision-language data. Method: This paper proposes TR-CoT, a theorem-verification-driven reverse CoT generation framework. It introduces a novel theorem-guided reverse reasoning paradigm, integrating formal geometric theorem modeling, structured graph description generation, multi-granularity attribute-text alignment verification, and bidirectional cross-validation to detect logical fallacies and enhance reasoning consistency. Contribution/Results: Experiments show that TR-CoT achieves absolute gains of +10.1% on MathVista and +4.7% on GeoQA over strong baselines; logical consistency improves by 24.5%; theorem coverage breadth expands significantly; and TR-CoT outperforms advanced closed-source models including GPT-4o.

Technology Category

Application Category

📝 Abstract
Large Multimodal Models (LMMs) face limitations in geometric reasoning due to insufficient Chain of Thought (CoT) image-text training data. While existing approaches leverage template-based or LLM-assisted methods for geometric CoT data creation, they often face challenges in achieving both diversity and precision. To bridge this gap, we introduce a two-stage Theorem-Validated Reverse Chain-of-Thought Reasoning Synthesis (TR-CoT) framework. The first stage, TR-Engine, synthesizes theorem-grounded geometric diagrams with structured descriptions and properties. The second stage, TR-Reasoner, employs reverse reasoning to iteratively refine question-answer pairs by cross-validating geometric properties and description fragments. Our approach expands theorem-type coverage, corrects long-standing misunderstandings, and enhances geometric reasoning. Fine-grained CoT improves theorem understanding and increases logical consistency by 24.5%. Our best models surpass the baselines in MathVista and GeoQA by 10.1% and 4.7%, outperforming advanced closed-source models like GPT-4o.
Problem

Research questions and friction points this paper is trying to address.

Addressing insufficient CoT image-text data for LMMs' geometric reasoning
Improving diversity and precision in geometric CoT data creation
Enhancing theorem understanding and logical consistency in geometric reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage TR-CoT framework for geometric reasoning
TR-Engine synthesizes theorem-grounded diagrams
TR-Reasoner refines QA pairs via reverse reasoning
🔎 Similar Papers
No similar papers found.
Linger Deng
Linger Deng
Huazhong University of Science and Technology
Computer VisionMultimodal Large Language ModelsOptical Character Recognition
Y
Yuliang Liu
Huazhong University of Science and Technology
B
Bohan Li
Department of Computer Vision Technology, Baidu Inc.
D
Dongliang Luo
Huazhong University of Science and Technology
L
Liang Wu
Department of Computer Vision Technology, Baidu Inc.
Chengquan Zhang
Chengquan Zhang
Unknown affiliation
computer visionapplication of deep learning
Pengyuan Lyu
Pengyuan Lyu
Huazhong University of Science and Technology
computer vision
Z
Ziyang Zhang
Huazhong University of Science and Technology
Gang Zhang
Gang Zhang
Tsinghua University
computer vision
Errui Ding
Errui Ding
Baidu Inc.
computer visionmachine learning
Y
Yingying Zhu
Huazhong University of Science and Technology
Xiang Bai
Xiang Bai
Huazhong University of Science and Technology (HUST)
Computer VisionOCR