🤖 AI Summary
Addressing the challenges of poor generalization from sparse clicks and unquantifiable uncertainty in 3D point cloud interactive segmentation, this paper proposes a hierarchical probabilistic segmentation framework based on neural processes. Methodologically, it introduces a dual-level latent variable structure—operating at both scene- and object-level—integrated with point cloud feature encoding and hierarchical variational inference; a probabilistic prototype modulator is further designed to enable object-aware global contextual modeling. Notably, it is the first approach in 3D interactive segmentation to support pixel-level calibrated uncertainty quantification. Evaluated on four mainstream benchmarks, the method achieves state-of-the-art segmentation accuracy while reducing average click count by 37%; it also improves uncertainty map error localization accuracy by 21.6%. These advances significantly enhance human-AI collaboration efficiency and model interpretability.
📝 Abstract
Interactive 3D segmentation has emerged as a promising solution for generating accurate object masks in complex 3D scenes by incorporating user-provided clicks. However, two critical challenges remain underexplored: (1) effectively generalizing from sparse user clicks to produce accurate segmentation, and (2) quantifying predictive uncertainty to help users identify unreliable regions. In this work, we propose NPISeg3D, a novel probabilistic framework that builds upon Neural Processes (NPs) to address these challenges. Specifically, NPISeg3D introduces a hierarchical latent variable structure with scene-specific and object-specific latent variables to enhance few-shot generalization by capturing both global context and object-specific characteristics. Additionally, we design a probabilistic prototype modulator that adaptively modulates click prototypes with object-specific latent variables, improving the model's ability to capture object-aware context and quantify predictive uncertainty. Experiments on four 3D point cloud datasets demonstrate that NPISeg3D achieves superior segmentation performance with fewer clicks while providing reliable uncertainty estimations.