GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection

πŸ“… 2025-03-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
3D open-vocabulary detection (OVD) on LiDAR point clouds faces challenges due to the absence of predefined category labels and difficulty in discriminating visually similar objects. Method: We propose a global-local collaborative reasoning and multi-round debate framework. Specifically, we integrate large language models’ (LLMs) commonsense reasoning with probabilistic soft logic (OV-PSL) for interpretable semantic inference; design static and dynamic class balancing (SBC/DBC) to mitigate long-tail distribution; and introduce reflective pseudo-label generation (RPLG) and background-aware object localization (BAOL) to enhance localization robustness. Results: On ScanNet and SUN RGB-D under full open-vocabulary settings, our method achieves absolute mAP improvements of 4.03% and 14.11%, respectively, significantly outperforming prior approaches. To the best of our knowledge, this is the first end-to-end 3D OVD framework enabling joint scene-level and object-level representation learning and inference.

Technology Category

Application Category

πŸ“ Abstract
The task of LiDAR-based 3D Open-Vocabulary Detection (3D OVD) requires the detector to learn to detect novel objects from point clouds without off-the-shelf training labels. Previous methods focus on the learning of object-level representations and ignore the scene-level information, thus it is hard to distinguish objects with similar classes. In this work, we propose a Global-Local Collaborative Reason and Debate with PSL (GLRD) framework for the 3D OVD task, considering both local object-level information and global scene-level information. Specifically, LLM is utilized to perform common sense reasoning based on object-level and scene-level information, where the detection result is refined accordingly. To further boost the LLM's ability of precise decisions, we also design a probabilistic soft logic solver (OV-PSL) to search for the optimal solution, and a debate scheme to confirm the class of confusable objects. In addition, to alleviate the uneven distribution of classes, a static balance scheme (SBC) and a dynamic balance scheme (DBC) are designed. In addition, to reduce the influence of noise in data and training, we further propose Reflected Pseudo Labels Generation (RPLG) and Background-Aware Object Localization (BAOL). Extensive experiments conducted on ScanNet and SUN RGB-D demonstrate the superiority of GLRD, where absolute improvements in mean average precision are $+2.82%$ on SUN RGB-D and $+3.72%$ on ScanNet in the partial open-vocabulary setting. In the full open-vocabulary setting, the absolute improvements in mean average precision are $+4.03%$ on ScanNet and $+14.11%$ on SUN RGB-D.
Problem

Research questions and friction points this paper is trying to address.

Detect novel 3D objects without training labels
Distinguish objects with similar classes using scene-level info
Improve detection accuracy with LLM reasoning and debate
Innovation

Methods, ideas, or system contributions that make the work stand out.

Global-local reasoning with LLM for 3D detection
Probabilistic soft logic solver for optimal solutions
Debate scheme to resolve confusable object classes
πŸ”Ž Similar Papers
No similar papers found.