Geoparsing: Diagram Parsing for Plane and Solid Geometry with a Unified Formal Language

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the limited ability of multimodal large language models to accurately perceive fine-grained visual elements and understand three-dimensional spatial relationships in geometric reasoning. To bridge this gap, the authors introduce a unified formal language that encompasses both planar and solid geometry, along with GDP-29K—a large-scale dataset comprising 29,000 real-world diagram–description pairs. By combining supervised fine-tuning with reinforcement learning guided by verifiable rewards, the proposed approach enables models to precisely parse geometric diagrams into formal descriptions. The method achieves state-of-the-art performance on diagram parsing tasks and substantially enhances downstream geometric reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Multimodal Large Language Models (MLLMs) have achieved remarkable progress but continue to struggle with geometric reasoning, primarily due to the perception bottleneck regarding fine-grained visual elements. While formal languages have aided plane geometry understanding, solid geometry which requires spatial understanding remains largely unexplored. In this paper, we address this challenge by designing a unified formal language that integrates plane and solid geometry, comprehensively covering geometric structures and semantic relations. We construct GDP-29K, a large-scale dataset comprising 20k plane and 9k solid geometry samples collected from diverse real-world sources, each paired with its ground-truth formal description. To ensure syntactic correctness and geometric consistency, we propose a training paradigm that combines Supervised Fine-Tuning with Reinforcement Learning via Verifiable Rewards. Experiments show that our approach achieves state-of-the-art parsing performance. Furthermore, we demonstrate that our parsed formal descriptions serve as a critical cognitive scaffold, significantly boosting MLLMs' capabilities for downstream geometry reasoning tasks. Our data and code are available at Geoparsing.

Problem

Research questions and friction points this paper is trying to address.

geometric reasoning

multimodal large language models

formal language

plane geometry

solid geometry

Innovation

Methods, ideas, or system contributions that make the work stand out.

geoparsing

unified formal language

solid geometry reasoning