TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

πŸ“… 2025-04-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current multimodal geometric problem solving (GPS) approaches suffer from three key bottlenecks: insufficient multimodal fusion, logical inconsistency, and cross-modal misalignment; moreover, prevailing AI-generated geometry datasets lack formal verification, exhibiting high noise and frequent contradictions. To address these issues, we propose GeoTrustβ€”a novel data engine featuring four innovative mechanisms: (1) multimodal-aligned problem generation, (2) formal reasoning-path verification via Coq/Lean-based theorem proving, (3) recursive complexity guidance, and (4) multi-solution self-reflective backtracking. Integrating Bootstrapping state evolution with the GeoExplore algorithm, we construct GeoTrust-200K (training) and GeoTrust-test (evaluation)β€”the first formally verifiable, logically consistent, and modally complete geometry benchmark. State-of-the-art models achieve only 49.17% accuracy on GeoTrust-test. Fine-tuning on GeoTrust significantly improves out-of-distribution generalization on GeoQA and markedly reduces logical contradiction rates.

Technology Category

Application Category

πŸ“ Abstract
Mathematical geometric problem solving (GPS) often requires effective integration of multimodal information and verifiable logical coherence. Despite the fast development of large language models in general problem solving, it remains unresolved regarding with both methodology and benchmarks, especially given the fact that exiting synthetic GPS benchmarks are often not self-verified and contain noise and self-contradicted information due to the illusion of LLMs. In this paper, we propose a scalable data engine called TrustGeoGen for problem generation, with formal verification to provide a principled benchmark, which we believe lays the foundation for the further development of methods for GPS. The engine synthesizes geometric data through four key innovations: 1) multimodal-aligned generation of diagrams, textual descriptions, and stepwise solutions; 2) formal verification ensuring rule-compliant reasoning paths; 3) a bootstrapping mechanism enabling complexity escalation via recursive state generation and 4) our devised GeoExplore series algorithms simultaneously produce multi-solution variants and self-reflective backtracking traces. By formal logical verification, TrustGeoGen produces GeoTrust-200K dataset with guaranteed modality integrity, along with GeoTrust-test testset. Experiments reveal the state-of-the-art models achieve only 49.17% accuracy on GeoTrust-test, demonstrating its evaluation stringency. Crucially, models trained on GeoTrust achieve OOD generalization on GeoQA, significantly reducing logical inconsistencies relative to pseudo-label annotated by OpenAI-o1. Our code is available at https://github.com/Alpha-Innovator/TrustGeoGen
Problem

Research questions and friction points this paper is trying to address.

Ensuring trustworthy multimodal geometric problem solving with formal verification
Addressing noise and contradictions in synthetic geometric benchmarks
Enhancing logical coherence and scalability in geometric data generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal-aligned generation of diagrams and descriptions
Formal verification for rule-compliant reasoning paths
Bootstrapping mechanism enables complexity escalation
πŸ”Ž Similar Papers
No similar papers found.
Daocheng Fu
Daocheng Fu
Fudan University, Shanghai Artificial Intelligence Laboratory
Traffic simulationLarge language modelAutonomous driving
Z
Zijun Chen
Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory
Renqiu Xia
Renqiu Xia
SJTU
LLMVLM
Q
Qi Liu
Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory
Y
Yuan Feng
Shanghai Jiao Tong University, Shanghai Artificial Intelligence Laboratory
Hongbin Zhou
Hongbin Zhou
Shanghai AI Laboratory
Renrui Zhang
Renrui Zhang
Seed ByteDance & MMLab & PKU
Large Multimodal ModelGenerative ModelEmbodied AI
Shiyang Feng
Shiyang Feng
Researcher
AI for Science
P
Peng Gao
Shanghai Artificial Intelligence Laboratory
Junchi Yan
Junchi Yan
FIAPR & ICML Board Member, SJTU (2018-), SII (2024-), AWS (2019-2022), IBM (2011-2018)
Computational IntelligenceAI4ScienceMachine LearningAutonomous Driving
Botian Shi
Botian Shi
Shanghai Artificial Intelligence Laboratory
VLMsDocument UnderstandingAutonomous Driving
B
Bo Zhang
Shanghai Artificial Intelligence Laboratory
Y
Yu Qiao
Shanghai Artificial Intelligence Laboratory