GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge large models face in simultaneously modeling global shape and local geometric relationships in geometric problem solving. To this end, the authors propose GeoFocus, a framework that enhances critical structure identification through thirteen theory-driven, locally aware templates. Furthermore, they introduce VertexLang, a lightweight vertex-topology-based language that replaces redundant encoding schemes to enable efficient fusion of global and local geometric awareness. Evaluated on Geo3K, GeoQA, and FormalGeo7K benchmarks, GeoFocus outperforms baseline methods by 4.7% in accuracy, demonstrates superior robustness on MATHVERSE, and reduces training time by 20%.

Technology Category

Application Category

📝 Abstract
Geometry problem-solving remains a significant challenge for Large Multimodal Models (LMMs), requiring not only global shape recognition but also attention to intricate local relationships related to geometric theory. To address this, we propose GeoFocus, a novel framework comprising two core modules. 1) Critical Local Perceptor, which automatically identifies and emphasizes critical local structure (e.g., angles, parallel lines, comparative distances) through thirteen theory-based perception templates, boosting critical local feature coverage by 61% compared to previous methods. 2) VertexLang, a compact topology formal language, encodes global figures through vertex coordinates and connectivity relations. By replacing bulky code-based encodings, VertexLang reduces global perception training time by 20% while improving topology recognition accuracy. When evaluated in Geo3K, GeoQA, and FormalGeo7K, GeoFocus achieves a 4.7% accuracy improvement over leading specialized models and demonstrates superior robustness in MATHVERSE under diverse visual conditions. Project Page -- https://github.com/dle666/GeoFocus
Problem

Research questions and friction points this paper is trying to address.

geometry problem-solving
Large Multimodal Models
global-to-local perception
local relationships
geometric theory
Innovation

Methods, ideas, or system contributions that make the work stand out.

Critical Local Perceptor
VertexLang
multimodal geometry reasoning
topology formal language
global-to-local perception
🔎 Similar Papers
No similar papers found.
Linger Deng
Linger Deng
Huazhong University of Science and Technology
Computer VisionMultimodal Large Language ModelsOptical Character Recognition
Y
Yuliang Liu
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
Wenwen Yu
Wenwen Yu
Huazhong University of Science and Technology
Computer VisionOCRDocument UnderstandingLarge Multimodal Models
Z
Zujia Zhang
School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
J
Jianzhong Ju
MiLM Plus, Xiaomi Inc, Beijing, 100000, China
Zhenbo Luo
Zhenbo Luo
XiaoMi
Vision Language ModelComputer Vision
Xiang Bai
Xiang Bai
Huazhong University of Science and Technology (HUST)
Computer VisionOCR