Plane Geometry Problem Solving with Multi-modal Reasoning: A Survey

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Planar Geometric Problem Solving (PGPS) serves as a critical benchmark for evaluating geometric reasoning in multimodal large language models, yet no systematic survey exists. This paper introduces the first unified classification framework for PGPS methods—grounded in the encoder-decoder paradigm—and systematically analyzes state-of-the-art approaches across three dimensions: model architecture, output format, and benchmark design. Our analysis identifies two fundamental challenges: (1) visual-symbol mapping hallucination during encoding, and (2) data leakage risks inherent in current benchmarks. Through multimodal reasoning diagnostics, architectural abstraction, and benchmark vulnerability assessment, we clarify key technical bottlenecks and propose concrete future directions: scalable encoding schemes, leakage-resistant benchmark construction, and formal verification protocols. This survey provides both theoretical foundations and practical guidance for advancing geometric reasoning in vision-language models.

Technology Category

Application Category

📝 Abstract

Plane geometry problem solving (PGPS) has recently gained significant attention as a benchmark to assess the multi-modal reasoning capabilities of large vision-language models. Despite the growing interest in PGPS, the research community still lacks a comprehensive overview that systematically synthesizes recent work in PGPS. To fill this gap, we present a survey of existing PGPS studies. We first categorize PGPS methods into an encoder-decoder framework and summarize the corresponding output formats used by their encoders and decoders. Subsequently, we classify and analyze these encoders and decoders according to their architectural designs. Finally, we outline major challenges and promising directions for future research. In particular, we discuss the hallucination issues arising during the encoding phase within encoder-decoder architectures, as well as the problem of data leakage in current PGPS benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Assessing multi-modal reasoning in vision-language models via PGPS

Lack of comprehensive survey on recent PGPS research advancements

Addressing hallucination and data leakage issues in PGPS benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Survey categorizes PGPS into encoder-decoder framework

Analyzes encoders and decoders by architectural designs

Discusses hallucination and data leakage challenges

🔎 Similar Papers

No similar papers found.