OpenGround: Active Cognition-based Reasoning for Open-World 3D Visual Grounding

📅 2025-12-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D visual grounding methods rely on predefined object lookup tables (OLTs), limiting generalization to unseen categories. This work proposes a zero-shot open-world 3D visual referring localization framework that eliminates OLT dependency. We introduce an Active Cognitive Reasoning (ACR) module that emulates human perceptual chains to dynamically expand the cognitive capacity of vision-language models (VLMs). We also construct OpenTarget—the first open-world 3D referring benchmark—and unify predefined and open-category modeling. Our approach integrates VLMs, cognitive task-chain modeling, dynamic OLT updating, and zero-shot cross-domain transfer, enabling fine-grained 3D point cloud–language alignment. Experiments show competitive performance on Nr3D, state-of-the-art results on ScanRefer, and a 17.6% absolute accuracy improvement on OpenTarget.

Technology Category

Application Category

📝 Abstract
3D visual grounding aims to locate objects based on natural language descriptions in 3D scenes. Existing methods rely on a pre-defined Object Lookup Table (OLT) to query Visual Language Models (VLMs) for reasoning about object locations, which limits the applications in scenarios with undefined or unforeseen targets. To address this problem, we present OpenGround, a novel zero-shot framework for open-world 3D visual grounding. Central to OpenGround is the Active Cognition-based Reasoning (ACR) module, which is designed to overcome the fundamental limitation of pre-defined OLTs by progressively augmenting the cognitive scope of VLMs. The ACR module performs human-like perception of the target via a cognitive task chain and actively reasons about contextually relevant objects, thereby extending VLM cognition through a dynamically updated OLT. This allows OpenGround to function with both pre-defined and open-world categories. We also propose a new dataset named OpenTarget, which contains over 7000 object-description pairs to evaluate our method in open-world scenarios. Extensive experiments demonstrate that OpenGround achieves competitive performance on Nr3D, state-of-the-art on ScanRefer, and delivers a substantial 17.6% improvement on OpenTarget. Project Page at [this https URL](https://why-102.github.io/openground.io/).
Problem

Research questions and friction points this paper is trying to address.

Addresses open-world 3D visual grounding without predefined object categories
Overcomes limitations of fixed object lookup tables in existing methods
Enables zero-shot localization of undefined or unforeseen 3D objects
Innovation

Methods, ideas, or system contributions that make the work stand out.

Active Cognition-based Reasoning for open-world 3D grounding
Dynamically updated Object Lookup Table to extend VLM cognition
Zero-shot framework handling both pre-defined and unforeseen targets
🔎 Similar Papers
No similar papers found.
W
Wenyuan Huang
Nanjing University, School of Intelligent Science and Technology
Z
Zhao Wang
China Mobile Zijin Innovation Institute
Zhou Wei
Zhou Wei
Yunnan University of Finance and Economics
Risk analysisFinancial EngineeringFinance SafyDecision science.
T
Ting Huang
Nanjing University, School of Intelligent Science and Technology
F
Fang Zhao
Nanjing University, School of Intelligent Science and Technology
J
Jian Yang
Nanjing University, School of Intelligent Science and Technology
Z
Zhenyu Zhang
Nanjing University, School of Intelligent Science and Technology