Back to Point: Exploring Point-Language Models for Zero-Shot 3D Anomaly Detection

📅 2026-03-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing zero-shot 3D anomaly detection methods, which often render point clouds into 2D images and consequently lose critical geometric details, leading to insufficient sensitivity to local anomalies. To overcome this, the authors propose the BTP framework, which pioneers the use of pretrained point cloud–language models for this task. BTP aligns multi-granularity point cloud patches with textual embeddings, integrates geometric descriptors, and leverages auxiliary point cloud data for joint representation learning, thereby significantly enhancing the model’s ability to perceive and localize structural anomalies. Extensive experiments on the Real3D-AD and Anomaly-ShapeNet benchmarks demonstrate that BTP substantially outperforms current state-of-the-art methods, achieving the best-reported performance in zero-shot 3D anomaly detection.

Technology Category

Application Category

📝 Abstract
Zero-shot (ZS) 3D anomaly detection is crucial for reliable industrial inspection, as it enables detecting and localizing defects without requiring any target-category training data. Existing approaches render 3D point clouds into 2D images and leverage pre-trained Vision-Language Models (VLMs) for anomaly detection. However, such strategies inevitably discard geometric details and exhibit limited sensitivity to local anomalies. In this paper, we revisit intrinsic 3D representations and explore the potential of pre-trained Point-Language Models (PLMs) for ZS 3D anomaly detection. We propose BTP (Back To Point), a novel framework that effectively aligns 3D point cloud and textual embeddings. Specifically, BTP aligns multi-granularity patch features with textual representations for localized anomaly detection, while incorporating geometric descriptors to enhance sensitivity to structural anomalies. Furthermore, we introduce a joint representation learning strategy that leverages auxiliary point cloud data to improve robustness and enrich anomaly semantics. Extensive experiments on Real3D-AD and Anomaly-ShapeNet demonstrate that BTP achieves superior performance in ZS 3D anomaly detection. Code will be available at \href{https://github.com/wistful-8029/BTP-3DAD}{https://github.com/wistful-8029/BTP-3DAD}.
Problem

Research questions and friction points this paper is trying to address.

Zero-shot 3D anomaly detection
Point cloud
Anomaly localization
Geometric details
Industrial inspection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Point-Language Models
Zero-Shot 3D Anomaly Detection
Geometric Descriptors
Multi-Granularity Alignment
Joint Representation Learning
🔎 Similar Papers
No similar papers found.
K
Kaiqiang Li
Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, China
G
Gang Li
Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, China
M
Mingle Zhou
Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, China
M
Min Li
Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, China
D
Delong Han
Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China; Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, China
Jin Wan
Jin Wan
Associate Professor of Computer Science and Technology, Qilu University of Technology
Computer visionMachine learning