Towards 3D Objectness Learning in an Open World

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses open-world 3D object detection, aiming to discover *all* objects—both known and unknown classes—in 3D scenes without supervision, thereby overcoming the category-bound limitations of conventional closed-set detectors. To this end, we propose OP3Det, the first text-prompt-free, class-agnostic open-world 3D detection framework. Its core innovations include: (i) synergistic integration of semantic priors from 2D foundation models with geometric priors from point clouds; (ii) a cross-modal Mixture-of-Experts (MoE) mechanism that dynamically fuses RGB image and point cloud features; and (iii) class-agnostic proposal generation coupled with zero-shot generalization. On standard benchmarks, OP3Det achieves up to a 16.0% improvement in average recall (AR) over prior open-world methods and a 13.5% gain over state-of-the-art closed-set detectors, demonstrating significantly enhanced generalizability for detecting novel-category objects.

Technology Category

Application Category

📝 Abstract

Recent advancements in 3D object detection and novel category detection have made significant progress, yet research on learning generalized 3D objectness remains insufficient. In this paper, we delve into learning open-world 3D objectness, which focuses on detecting all objects in a 3D scene, including novel objects unseen during training. Traditional closed-set 3D detectors struggle to generalize to open-world scenarios, while directly incorporating 3D open-vocabulary models for open-world ability struggles with vocabulary expansion and semantic overlap. To achieve generalized 3D object discovery, We propose OP3Det, a class-agnostic Open-World Prompt-free 3D Detector to detect any objects within 3D scenes without relying on hand-crafted text prompts. We introduce the strong generalization and zero-shot capabilities of 2D foundation models, utilizing both 2D semantic priors and 3D geometric priors for class-agnostic proposals to broaden 3D object discovery. Then, by integrating complementary information from point cloud and RGB image in the cross-modal mixture of experts, OP3Det dynamically routes uni-modal and multi-modal features to learn generalized 3D objectness. Extensive experiments demonstrate the extraordinary performance of OP3Det, which significantly surpasses existing open-world 3D detectors by up to 16.0% in AR and achieves a 13.5% improvement compared to closed-world 3D detectors.

Problem

Research questions and friction points this paper is trying to address.

Detecting all objects in 3D scenes including unseen categories

Overcoming limitations of closed-set 3D detectors in open-world scenarios

Learning generalized 3D objectness without relying on text prompts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes a class-agnostic Open-World Prompt-free 3D Detector

Integrates 2D semantic and 3D geometric priors for object discovery

Uses cross-modal mixture of experts to learn generalized 3D objectness

🔎 Similar Papers

Unsupervised Discovery of Object-Centric Neural Fields