Revisiting 3D LLM Benchmarks: Are We Really Testing 3D Capabilities?

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper identifies a critical “2D cheating” problem in current 3D large language model (LLM) evaluation: mainstream benchmarks rely on rendered 2D images of point clouds, enabling vision-language models (VLMs) to achieve high accuracy without genuine 3D understanding—leading to severe overestimation of 3D capabilities. To address this, the authors systematically characterize this evaluation shortcut and propose the “capability decoupling” principle, which explicitly separates 1D/2D perception from intrinsic 3D geometric and spatial reasoning. Through cross-model VLM benchmarking, controlled rendering experiments, and the design of a novel evaluation framework, they empirically demonstrate that most existing 3D benchmarks are effectively 2D tasks. The work establishes formal validity criteria for 3D evaluation and provides both theoretical foundations and practical standards for developing truly 3D-structure-aware assessment paradigms.

Technology Category

Application Category

📝 Abstract
In this work, we identify the"2D-Cheating"problem in 3D LLM evaluation, where these tasks might be easily solved by VLMs with rendered images of point clouds, exposing ineffective evaluation of 3D LLMs' unique 3D capabilities. We test VLM performance across multiple 3D LLM benchmarks and, using this as a reference, propose principles for better assessing genuine 3D understanding. We also advocate explicitly separating 3D abilities from 1D or 2D aspects when evaluating 3D LLMs.
Problem

Research questions and friction points this paper is trying to address.

Identifying '2D-Cheating' in 3D LLM evaluation
Testing VLM performance on 3D benchmarks
Proposing principles for genuine 3D understanding assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identify 2D-Cheating in 3D LLM evaluation
Propose principles for genuine 3D understanding
Separate 3D abilities from 1D or 2D aspects
🔎 Similar Papers
No similar papers found.