Geoparsing: Diagram Parsing for Plane and Solid Geometry with a Unified Formal Language

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited ability of multimodal large language models to accurately perceive fine-grained visual elements and understand three-dimensional spatial relationships in geometric reasoning. To bridge this gap, the authors introduce a unified formal language that encompasses both planar and solid geometry, along with GDP-29K—a large-scale dataset comprising 29,000 real-world diagram–description pairs. By combining supervised fine-tuning with reinforcement learning guided by verifiable rewards, the proposed approach enables models to precisely parse geometric diagrams into formal descriptions. The method achieves state-of-the-art performance on diagram parsing tasks and substantially enhances downstream geometric reasoning capabilities.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) have achieved remarkable progress but continue to struggle with geometric reasoning, primarily due to the perception bottleneck regarding fine-grained visual elements. While formal languages have aided plane geometry understanding, solid geometry which requires spatial understanding remains largely unexplored. In this paper, we address this challenge by designing a unified formal language that integrates plane and solid geometry, comprehensively covering geometric structures and semantic relations. We construct GDP-29K, a large-scale dataset comprising 20k plane and 9k solid geometry samples collected from diverse real-world sources, each paired with its ground-truth formal description. To ensure syntactic correctness and geometric consistency, we propose a training paradigm that combines Supervised Fine-Tuning with Reinforcement Learning via Verifiable Rewards. Experiments show that our approach achieves state-of-the-art parsing performance. Furthermore, we demonstrate that our parsed formal descriptions serve as a critical cognitive scaffold, significantly boosting MLLMs' capabilities for downstream geometry reasoning tasks. Our data and code are available at Geoparsing.
Problem

Research questions and friction points this paper is trying to address.

geometric reasoning
multimodal large language models
formal language
plane geometry
solid geometry
Innovation

Methods, ideas, or system contributions that make the work stand out.

geoparsing
unified formal language
solid geometry reasoning
multimodal LLMs
verifiable reinforcement learning
🔎 Similar Papers
No similar papers found.
Peijie Wang
Peijie Wang
Institute of Automation Chinese Academy of Sciences
Multimodal LLMsmath reasoning
Ming-Liang Zhang
Ming-Liang Zhang
PhD, Senior Algorithm Engineer at Alibaba Beijing
Multimodal ReasoningMath Problem SolvingScene Parsing
J
Jun Cao
MAIS, Institute of Automation of Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences
C
Chao Deng
MAIS, Institute of Automation of Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences
D
Dekang Ran
MAIS, Institute of Automation of Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences
Hongda Sun
Hongda Sun
Renmin University of China
Natural Language ProcessingLarge Language ModelsAI for Healthcare
P
Pi Bu
Future Living Lab of Alibaba
X
Xuan Zhang
Future Living Lab of Alibaba
Yingyao Wang
Yingyao Wang
Alibaba Group, Harbin Institute of Technology
LVLMQuestion AnsweringKnowledge Reasoning
Jun Song
Jun Song
Shenzhen University
nanophotonics
Bo Zheng
Bo Zheng
Researcher, Alibaba Group
AINetworkE-Commerce
Fei Yin
Fei Yin
NLPR, CASIA
ocrpattern recognition
Cheng-Lin Liu
Cheng-Lin Liu
Institute of Automation, Chinese Academy of Sciences
pattern recognitioncharacter recognitiondocument analysismachine learning