GeoFM: Enhancing Geometric Reasoning of MLLMs via Synthetic Data Generation through Formal Language

📅 2025-10-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Multimodal large language models (MLLMs) suffer from limited geometric reasoning capabilities due to the scarcity of high-quality geometric data; existing synthetic data generation methods suffer from low diversity, high noise, and poor image fidelity. Method: We propose GeoFM—a novel framework that formally models geometric structures via precise formal language, integrates symbolic computation engines for logical verification, and systematically composes geometric constraints within metric spaces to enable joint, high-fidelity, and diverse synthesis of geometric images and corresponding problem texts. Unlike template-based approaches, GeoFM generates logically rigorous synthetic data aligned with real-world problem distributions. Contribution/Results: When trained exclusively on GeoFM-synthesized data, our model achieves new state-of-the-art results on MathVista and GeoQA—outperforming GPT-4o by +18.7% and +16.5%, respectively, and surpassing the best open-source models by +5.7% and +2.7%. These gains demonstrate substantial improvements in geometric reasoning performance.

Technology Category

Application Category

📝 Abstract
Multi-modal Large Language Models (MLLMs) have gained significant attention in both academia and industry for their capabilities in handling multi-modal tasks. However, these models face challenges in mathematical geometric reasoning due to the scarcity of high-quality geometric data. To address this issue, synthetic geometric data has become an essential strategy. Current methods for generating synthetic geometric data involve rephrasing or expanding existing problems and utilizing predefined rules and templates to create geometric images and problems. However, these approaches often produce data that lacks diversity or is prone to noise. Additionally, the geometric images synthesized by existing methods tend to exhibit limited variation and deviate significantly from authentic geometric diagrams. To overcome these limitations, we propose GeoFM, a novel method for synthesizing geometric data. GeoFM uses formal languages to explore combinations of conditions within metric space, generating high-fidelity geometric problems that differ from the originals while ensuring correctness through a symbolic engine. Experimental results show that our synthetic data significantly outperforms existing methods. The model trained with our data surpass the proprietary GPT-4o model by 18.7% on geometry problem-solving tasks in MathVista and by 16.5% on GeoQA. Additionally, it exceeds the performance of a leading open-source model by 5.7% on MathVista and by 2.7% on GeoQA.
Problem

Research questions and friction points this paper is trying to address.

Addressing geometric reasoning challenges in MLLMs through synthetic data generation
Overcoming limited diversity and noise in existing geometric data synthesis methods
Improving geometric problem-solving accuracy using formal language and symbolic verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses formal languages to explore geometric condition combinations
Generates high-fidelity geometric problems through symbolic engine
Creates diverse synthetic data in metric space for training
🔎 Similar Papers
No similar papers found.
Y
Yuhao Zhang
Tencent Hunyuan Team
D
Dingxin Hu
Tencent Hunyuan Team
T
Tinghao Yu
Tencent Hunyuan Team
H
Hao Liu
Tencent Hunyuan Team
Yiting Liu
Yiting Liu
University of California San Diego
EDAVLSI Physical DesignMachine LearningData Privacy Protection