GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection

📅 2025-03-26

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

3D open-vocabulary detection (OVD) on LiDAR point clouds faces challenges due to the absence of predefined category labels and difficulty in discriminating visually similar objects. Method: We propose a global-local collaborative reasoning and multi-round debate framework. Specifically, we integrate large language models’ (LLMs) commonsense reasoning with probabilistic soft logic (OV-PSL) for interpretable semantic inference; design static and dynamic class balancing (SBC/DBC) to mitigate long-tail distribution; and introduce reflective pseudo-label generation (RPLG) and background-aware object localization (BAOL) to enhance localization robustness. Results: On ScanNet and SUN RGB-D under full open-vocabulary settings, our method achieves absolute mAP improvements of 4.03% and 14.11%, respectively, significantly outperforming prior approaches. To the best of our knowledge, this is the first end-to-end 3D OVD framework enabling joint scene-level and object-level representation learning and inference.

Technology Category

Application Category

📝 Abstract

The task of LiDAR-based 3D Open-Vocabulary Detection (3D OVD) requires the detector to learn to detect novel objects from point clouds without off-the-shelf training labels. Previous methods focus on the learning of object-level representations and ignore the scene-level information, thus it is hard to distinguish objects with similar classes. In this work, we propose a Global-Local Collaborative Reason and Debate with PSL (GLRD) framework for the 3D OVD task, considering both local object-level information and global scene-level information. Specifically, LLM is utilized to perform common sense reasoning based on object-level and scene-level information, where the detection result is refined accordingly. To further boost the LLM's ability of precise decisions, we also design a probabilistic soft logic solver (OV-PSL) to search for the optimal solution, and a debate scheme to confirm the class of confusable objects. In addition, to alleviate the uneven distribution of classes, a static balance scheme (SBC) and a dynamic balance scheme (DBC) are designed. In addition, to reduce the influence of noise in data and training, we further propose Reflected Pseudo Labels Generation (RPLG) and Background-Aware Object Localization (BAOL). Extensive experiments conducted on ScanNet and SUN RGB-D demonstrate the superiority of GLRD, where absolute improvements in mean average precision are $+2.82%$ on SUN RGB-D and $+3.72%$ on ScanNet in the partial open-vocabulary setting. In the full open-vocabulary setting, the absolute improvements in mean average precision are $+4.03%$ on ScanNet and $+14.11%$ on SUN RGB-D.

Problem

Research questions and friction points this paper is trying to address.

Detect novel 3D objects without training labels

Distinguish objects with similar classes using scene-level info

Improve detection accuracy with LLM reasoning and debate

Innovation

Methods, ideas, or system contributions that make the work stand out.

Global-local reasoning with LLM for 3D detection

Probabilistic soft logic solver for optimal solutions

Debate scheme to resolve confusable object classes

🔎 Similar Papers

Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model