FoREST: Frame of Reference Evaluation in Spatial Reasoning Tasks

📅 2025-02-25

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

Current large language models (LLMs) exhibit severe deficiencies in understanding and evaluating Frames of Reference (FoR)—a core aspect of spatial intelligence—due to the absence of dedicated, comprehensive benchmarks, thereby limiting their performance on spatial reasoning tasks such as text-to-layout generation. Method: We introduce FoREST, the first FoR-aware spatial reasoning benchmark, covering absolute, relative, and intrinsic FoR types, and comprising multidimensional spatial question answering and layout generation tasks. We further propose Spatial-Guided prompting—a novel method integrating structured prompt engineering with a cross-modal evaluation framework—to enhance FoR identification and spatial concept extraction. Contribution/Results: Extensive experiments reveal substantial performance disparities across FoR types among mainstream LLMs. Our approach achieves an average accuracy improvement of 12.7% on FoREST, establishing a new paradigm for evaluating and enhancing spatial intelligence in LLMs.

Technology Category

Application Category

📝 Abstract

Spatial reasoning is a fundamental aspect of human intelligence. One key concept in spatial cognition is the Frame of Reference (FoR), which identifies the perspective of spatial expressions. Despite its significance, FoR has received limited attention in AI models that need spatial intelligence. There is a lack of dedicated benchmarks and in-depth evaluation of large language models (LLMs) in this area. To address this issue, we introduce the Frame of Reference Evaluation in Spatial Reasoning Tasks (FoREST) benchmark, designed to assess FoR comprehension in LLMs. We evaluate LLMs on answering questions that require FoR comprehension and layout generation in text-to-image models using FoREST. Our results reveal a notable performance gap across different FoR classes in various LLMs, affecting their ability to generate accurate layouts for text-to-image generation. This highlights critical shortcomings in FoR comprehension. To improve FoR understanding, we propose Spatial-Guided prompting, which improves LLMs ability to extract essential spatial concepts. Our proposed method improves overall performance across spatial reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

Evaluates Frame of Reference in AI models

Assesses spatial reasoning in large language models

Improves layout accuracy in text-to-image generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces FoREST benchmark for spatial reasoning

Evaluates LLMs on Frame of Reference comprehension

Proposes Spatial-Guided prompting for spatial concept extraction

🔎 Similar Papers

No similar papers found.