MCL-AD: Multimodal Collaboration Learning for Zero-Shot 3D Anomaly Detection

πŸ“… 2025-09-12
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Zero-shot 3D anomaly detection aims to identify defects in 3D objects without labeled anomalous samples, yet existing approaches are largely confined to unimodal point clouds and suffer from limited semantic representation capability. This paper proposes a multimodal collaborative framework for zero-shot 3D anomaly detection, jointly leveraging point clouds, RGB images, and textual priors. We introduce multimodal prompt learning and a collaborative modulation mechanism to achieve cross-modal semantic disentanglement and complementary feature fusion. Key innovations include: (i) object-agnostic disentangled text prompts, (ii) an RGB–point cloud dual-guided modulation network, and (iii) a multimodal contrastive loss. Our method achieves significant performance gains over state-of-the-art unimodal and multimodal baselines across multiple benchmarks, demonstrating that multimodal semantic collaboration critically enhances zero-shot generalization capability.

Technology Category

Application Category

πŸ“ Abstract
Zero-shot 3D (ZS-3D) anomaly detection aims to identify defects in 3D objects without relying on labeled training data, making it especially valuable in scenarios constrained by data scarcity, privacy, or high annotation cost. However, most existing methods focus exclusively on point clouds, neglecting the rich semantic cues available from complementary modalities such as RGB images and texts priors. This paper introduces MCL-AD, a novel framework that leverages multimodal collaboration learning across point clouds, RGB images, and texts semantics to achieve superior zero-shot 3D anomaly detection. Specifically, we propose a Multimodal Prompt Learning Mechanism (MPLM) that enhances the intra-modal representation capability and inter-modal collaborative learning by introducing an object-agnostic decoupled text prompt and a multimodal contrastive loss. In addition, a collaborative modulation mechanism (CMM) is proposed to fully leverage the complementary representations of point clouds and RGB images by jointly modulating the RGB image-guided and point cloud-guided branches. Extensive experiments demonstrate that the proposed MCL-AD framework achieves state-of-the-art performance in ZS-3D anomaly detection.
Problem

Research questions and friction points this paper is trying to address.

Detects 3D object defects without labeled training data
Leverages multimodal collaboration across point clouds, RGB, texts
Addresses neglect of semantic cues from complementary modalities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal collaboration learning across point clouds, RGB, texts
Object-agnostic decoupled text prompt mechanism
RGB and point cloud joint modulation mechanism
πŸ”Ž Similar Papers
No similar papers found.
G
Gang Li
Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China and also with Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, China
T
Tianjiao Chen
Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China and also with Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, China
M
Mingle Zhou
Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China and also with Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, China
M
Min Li
Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China and also with Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, China
D
Delong Han
Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan, China and also with Shandong Provincial Key Laboratory of Computing Power Internet and Service Computing, Shandong Fundamental Research Center for Computer Science, Jinan, China
Jin Wan
Jin Wan
Associate Professor of Computer Science and Technology, Qilu University of Technology
Computer visionMachine learning