Resolving Long-Tail Ambiguity in Unsupervised 3D Point Cloud Segmentation with Language Priors

📅 2026-05-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
In unsupervised 3D point cloud segmentation, long-tailed categories are often absorbed by dominant classes during purely visual clustering, leading to severely imbalanced predictions. To address this issue, this work proposes LangTail, a novel framework that, for the first time, incorporates balanced semantic priors from language models into unsupervised 3D segmentation. By aligning language and visual features at multiple levels, LangTail guides hierarchical clustering and explicitly associates entity-level semantics with rare categories, thereby enhancing the representational capacity for minority classes. The method achieves significant improvements over existing approaches, yielding gains of 13.5, 12.9, and 8.9 mIoU on ScanNet-v2, S3DIS, and nuScenes benchmarks, respectively.
📝 Abstract
Existing approaches for unsupervised 3D point cloud segmentation predominantly rely on a purely visual similarity-based learning-by-clustering paradigm, which suffers from a fundamental limitation: long-tail ambiguity. In such a paradigm, features of minor classes are consistently absorbed by dominant clusters, leading to severely imbalanced predictions. To address this issue, we propose LangTail, a language-guided hierarchical learning framework that leverages the balanced world knowledge encoded in language models to mitigate long-tail ambiguity in unsupervised 3D segmentation. The key idea is to establish multi-level associations between language-derived semantic priors and visually underrepresented minor classes, thereby compensating for the biased attention of purely visual clustering toward dominant classes. Specifically, LangTail first constructs an entity-level semantic prior from language models, capturing balanced and fine-grained world knowledge across categories. These priors are injected into a hierarchical clustering framework via contrastive alignment. This guides multi-granularity semantic structure formation and prevents minor classes from being absorbed by dominant clusters, yielding more discriminative representations for underrepresented categories. Extensive experiments on ScanNet-v2, S3DIS, and nuScenes demonstrate that LangTail consistently outperforms existing methods by significant margins, \ie, +13.5, +12.9, and +8.9 mIoU, respectively. These results demonstrate the effectiveness of language priors in improving the representation of minority classes in 3D point clouds. The code will be released at: https://github.com/Whisky0129/langtail_official.
Problem

Research questions and friction points this paper is trying to address.

long-tail ambiguity
unsupervised 3D point cloud segmentation
class imbalance
minority classes
visual clustering
Innovation

Methods, ideas, or system contributions that make the work stand out.

language priors
unsupervised 3D segmentation
long-tail ambiguity
hierarchical clustering
contrastive alignment
🔎 Similar Papers