NoReGeo: Non-Reasoning Geometry Benchmark

πŸ“… 2026-01-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study investigates whether large language models possess native geometric understanding without relying on explicit reasoning or algebraic computation. To this end, the authors construct a benchmark comprising 2,500 questions spanning 25 distinct spatial scenarios and introduce a novel evaluation paradigm that exclusively requires direct geometric judgment. Through binary classification tasks and ablation studies, they assess model performance in pure geometric recognition. Experimental results reveal that even state-of-the-art models such as GPT-4 achieve at most 65% accuracy, and fine-tuning yields only marginal improvements. These findings indicate a fundamental limitation in current models’ innate geometric cognition, underscoring the necessity of incorporating geometric priors explicitly during early-stage training.

Technology Category

Application Category

πŸ“ Abstract
We present NoReGeo, a novel benchmark designed to evaluate the intrinsic geometric understanding of large language models (LLMs) without relying on reasoning or algebraic computation. Unlike existing benchmarks that primarily assess models'proficiency in reasoning-based geometry-where solutions are derived using algebraic methods-NoReGeo focuses on evaluating whether LLMs can inherently encode spatial relationships and recognize geometric properties directly. Our benchmark comprises 2,500 trivial geometric problems spanning 25 categories, each carefully crafted to be solvable purely through native geometric understanding, assuming known object locations. We assess a range of state-of-the-art models on NoReGeo, including frontier models like GPT-4, observing that even the most advanced systems achieve an overall maximum of 65% accuracy in binary classification tasks. Further, our ablation experiments demonstrate that such geometric understanding does not emerge through fine-tuning alone, indicating that effective training for geometric comprehension requires a specialized approach from the outset. Our findings highlight a significant gap in current LLMs'ability to natively grasp geometric concepts, providing a foundation for future research toward models with true geometric cognition.
Problem

Research questions and friction points this paper is trying to address.

geometric understanding
large language models
spatial relationships
non-reasoning geometry
geometric cognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

geometric understanding
large language models
non-reasoning benchmark
spatial reasoning
model evaluation
πŸ”Ž Similar Papers
No similar papers found.
Irina Abdullaeva
Irina Abdullaeva
Researcher, Multimodal Group, FusionBrain Lab
Multi-modalityNatural Language ProcessingLanguage ModelsMulti-agent LLMsComputer Vision
A
Anton Vasiliuk
FusionBrain Lab, Russia
E
Elizaveta Goncharova
FusionBrain Lab, Russia; HSE University, Russia
T
Temur Rahmatullaev
FusionBrain Lab, Russia; Lomonosov Moscow State University, Russia
Z
Zagorulko Ivan
Central University, Russia
Maxim Kurkin
Maxim Kurkin
FusionBrain Lab
Deep LearningComputer VisionMultimodal Learning
Andrey Kuznetsov
Andrey Kuznetsov
Head of FusionBrain Lab
AIgenerative AImultimodalitycomputer visiondigital forgery detection