GIQ: Benchmarking 3D Geometric Reasoning of Vision Foundation Models with Simulated and Real Polyhedra

📅 2025-06-09

📈 Citations: 0

✨ Influential: 0

career value

214K/year

🤖 AI Summary

While state-of-the-art monocular 3D reconstruction methods and vision-language models (VLMs) achieve strong performance on standard benchmarks, their genuine understanding of fundamental geometric properties—such as convexity, face structure, and symmetry—remains poorly characterized. Method: We introduce GIQ, the first 3D geometric reasoning benchmark comprising 224 real and synthetic polyhedra, including Platonic and Archimedean solids. GIQ systematically evaluates models across four tasks: monocular 3D reconstruction, symmetry detection, mental rotation, and zero-shot shape classification, employing linear probe analysis and a standardized complexity–symmetry gradient evaluation paradigm. Contribution/Results: Experiments reveal that SOTA reconstruction methods fail to recover basic polyhedral topology; VLMs recognize simple symmetries but suffer sharp accuracy degradation on mental rotation; and top multimodal models misclassify complex polyhedral geometric attributes at rates exceeding 70%.

Technology Category

Application Category

📝 Abstract

Monocular 3D reconstruction methods and vision-language models (VLMs) demonstrate impressive results on standard benchmarks, yet their true understanding of geometric properties remains unclear. We introduce GIQ , a comprehensive benchmark specifically designed to evaluate the geometric reasoning capabilities of vision and vision-language foundation models. GIQ comprises synthetic and real-world images of 224 diverse polyhedra - including Platonic, Archimedean, Johnson, and Catalan solids, as well as stellations and compound shapes - covering varying levels of complexity and symmetry. Through systematic experiments involving monocular 3D reconstruction, 3D symmetry detection, mental rotation tests, and zero-shot shape classification tasks, we reveal significant shortcomings in current models. State-of-the-art reconstruction algorithms trained on extensive 3D datasets struggle to reconstruct even basic geometric forms accurately. While foundation models effectively detect specific 3D symmetry elements via linear probing, they falter significantly in tasks requiring detailed geometric differentiation, such as mental rotation. Moreover, advanced vision-language assistants exhibit remarkably low accuracy on complex polyhedra, systematically misinterpreting basic properties like face geometry, convexity, and compound structures. GIQ is publicly available, providing a structured platform to highlight and address critical gaps in geometric intelligence, facilitating future progress in robust, geometry-aware representation learning.

Problem

Research questions and friction points this paper is trying to address.

Evaluating geometric reasoning in vision models using polyhedra

Assessing 3D reconstruction and symmetry detection capabilities

Testing zero-shot classification accuracy on complex geometric shapes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmarking 3D geometric reasoning with polyhedra

Evaluating models via symmetry detection and reconstruction

Assessing vision-language models on geometric differentiation

🔎 Similar Papers

Understanding Depth and Height Perception in Large Visual-Language Models