GeoFocus: Blending Efficient Global-to-Local Perception for Multimodal Geometry Problem-Solving

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses the challenge large models face in simultaneously modeling global shape and local geometric relationships in geometric problem solving. To this end, the authors propose GeoFocus, a framework that enhances critical structure identification through thirteen theory-driven, locally aware templates. Furthermore, they introduce VertexLang, a lightweight vertex-topology-based language that replaces redundant encoding schemes to enable efficient fusion of global and local geometric awareness. Evaluated on Geo3K, GeoQA, and FormalGeo7K benchmarks, GeoFocus outperforms baseline methods by 4.7% in accuracy, demonstrates superior robustness on MATHVERSE, and reduces training time by 20%.

Technology Category

Application Category

📝 Abstract

Geometry problem-solving remains a significant challenge for Large Multimodal Models (LMMs), requiring not only global shape recognition but also attention to intricate local relationships related to geometric theory. To address this, we propose GeoFocus, a novel framework comprising two core modules. 1) Critical Local Perceptor, which automatically identifies and emphasizes critical local structure (e.g., angles, parallel lines, comparative distances) through thirteen theory-based perception templates, boosting critical local feature coverage by 61% compared to previous methods. 2) VertexLang, a compact topology formal language, encodes global figures through vertex coordinates and connectivity relations. By replacing bulky code-based encodings, VertexLang reduces global perception training time by 20% while improving topology recognition accuracy. When evaluated in Geo3K, GeoQA, and FormalGeo7K, GeoFocus achieves a 4.7% accuracy improvement over leading specialized models and demonstrates superior robustness in MATHVERSE under diverse visual conditions. Project Page -- https://github.com/dle666/GeoFocus

Problem

Research questions and friction points this paper is trying to address.

geometry problem-solving

Large Multimodal Models

global-to-local perception

local relationships

geometric theory

Innovation

Methods, ideas, or system contributions that make the work stand out.

Critical Local Perceptor

VertexLang

multimodal geometry reasoning