PointAD+: Learning Hierarchical Representations for Zero-shot 3D Anomaly Detection

📅 2025-09-03

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the limited cross-unseen-category generalization in zero-shot 3D anomaly detection by proposing the first CLIP-based unified framework. Methodologically, it introduces point-pixel joint modeling to fuse rendering and geometric semantics; designs explicit and implicit dual-path anomaly representations; employs hierarchical text prompts—including both rendering- and geometry-aware prompts—and a cross-level contrastive alignment mechanism; and incorporates G-aggregation to enhance geometric awareness. The framework supports plug-and-play RGB modality integration. Evaluated on highly diverse unseen objects, it significantly improves both anomaly detection and segmentation performance, achieving state-of-the-art results across multiple benchmarks. Notably, it is the first method to enable simultaneous fine-grained spatial localization and holistic anomaly understanding in a generalizable zero-shot 3D anomaly modeling paradigm.

Technology Category

Application Category

📝 Abstract

In this paper, we aim to transfer CLIP's robust 2D generalization capabilities to identify 3D anomalies across unseen objects of highly diverse class semantics. To this end, we propose a unified framework to comprehensively detect and segment 3D anomalies by leveraging both point- and pixel-level information. We first design PointAD, which leverages point-pixel correspondence to represent 3D anomalies through their associated rendering pixel representations. This approach is referred to as implicit 3D representation, as it focuses solely on rendering pixel anomalies but neglects the inherent spatial relationships within point clouds. Then, we propose PointAD+ to further broaden the interpretation of 3D anomalies by introducing explicit 3D representation, emphasizing spatial abnormality to uncover abnormal spatial relationships. Hence, we propose G-aggregation to involve geometry information to enable the aggregated point representations spatially aware. To simultaneously capture rendering and spatial abnormality, PointAD+ proposes hierarchical representation learning, incorporating implicit and explicit anomaly semantics into hierarchical text prompts: rendering prompts for the rendering layer and geometry prompts for the geometry layer. A cross-hierarchy contrastive alignment is further introduced to promote the interaction between the rendering and geometry layers, facilitating mutual anomaly learning. Finally, PointAD+ integrates anomaly semantics from both layers to capture the generalized anomaly semantics. During the test, PointAD+ can integrate RGB information in a plug-and-play manner and further improve its detection performance. Extensive experiments demonstrate the superiority of PointAD+ in ZS 3D anomaly detection across unseen objects with highly diverse class semantics, achieving a holistic understanding of abnormality.

Problem

Research questions and friction points this paper is trying to address.

Transferring CLIP's 2D generalization to detect 3D anomalies

Learning hierarchical representations for zero-shot anomaly detection

Identifying unseen object anomalies across diverse class semantics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical representation learning with implicit and explicit 3D anomaly semantics

Cross-hierarchy contrastive alignment between rendering and geometry layers

Plug-and-play RGB integration for improved zero-shot detection performance

🔎 Similar Papers

No similar papers found.