TrustGeoGen: Scalable and Formal-Verified Data Engine for Trustworthy Multi-modal Geometric Problem Solving

📅 2025-04-22

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Current multimodal geometric problem solving (GPS) approaches suffer from three key bottlenecks: insufficient multimodal fusion, logical inconsistency, and cross-modal misalignment; moreover, prevailing AI-generated geometry datasets lack formal verification, exhibiting high noise and frequent contradictions. To address these issues, we propose GeoTrust—a novel data engine featuring four innovative mechanisms: (1) multimodal-aligned problem generation, (2) formal reasoning-path verification via Coq/Lean-based theorem proving, (3) recursive complexity guidance, and (4) multi-solution self-reflective backtracking. Integrating Bootstrapping state evolution with the GeoExplore algorithm, we construct GeoTrust-200K (training) and GeoTrust-test (evaluation)—the first formally verifiable, logically consistent, and modally complete geometry benchmark. State-of-the-art models achieve only 49.17% accuracy on GeoTrust-test. Fine-tuning on GeoTrust significantly improves out-of-distribution generalization on GeoQA and markedly reduces logical contradiction rates.

Technology Category

Application Category

📝 Abstract

Mathematical geometric problem solving (GPS) often requires effective integration of multimodal information and verifiable logical coherence. Despite the fast development of large language models in general problem solving, it remains unresolved regarding with both methodology and benchmarks, especially given the fact that exiting synthetic GPS benchmarks are often not self-verified and contain noise and self-contradicted information due to the illusion of LLMs. In this paper, we propose a scalable data engine called TrustGeoGen for problem generation, with formal verification to provide a principled benchmark, which we believe lays the foundation for the further development of methods for GPS. The engine synthesizes geometric data through four key innovations: 1) multimodal-aligned generation of diagrams, textual descriptions, and stepwise solutions; 2) formal verification ensuring rule-compliant reasoning paths; 3) a bootstrapping mechanism enabling complexity escalation via recursive state generation and 4) our devised GeoExplore series algorithms simultaneously produce multi-solution variants and self-reflective backtracking traces. By formal logical verification, TrustGeoGen produces GeoTrust-200K dataset with guaranteed modality integrity, along with GeoTrust-test testset. Experiments reveal the state-of-the-art models achieve only 49.17% accuracy on GeoTrust-test, demonstrating its evaluation stringency. Crucially, models trained on GeoTrust achieve OOD generalization on GeoQA, significantly reducing logical inconsistencies relative to pseudo-label annotated by OpenAI-o1. Our code is available at https://github.com/Alpha-Innovator/TrustGeoGen

Problem

Research questions and friction points this paper is trying to address.

Ensuring trustworthy multimodal geometric problem solving with formal verification

Addressing noise and contradictions in synthetic geometric benchmarks

Enhancing logical coherence and scalability in geometric data generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal-aligned generation of diagrams and descriptions

Formal verification for rule-compliant reasoning paths

Bootstrapping mechanism enables complexity escalation

🔎 Similar Papers

FGeo-HyperGNet: Geometry Problem Solving Integrating Formal Symbolic System and Hypergraph Neural Network