Multimodal Robust Prompt Distillation for 3D Point Cloud Models

📅 2025-11-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
3D point cloud models are highly vulnerable to adversarial attacks, while existing defense methods suffer from high computational overhead and poor generalization. Method: This paper proposes an efficient multimodal knowledge distillation framework that innovatively integrates teacher knowledge from three modalities—vision (depth-projected images), 3D (high-performance point cloud models), and text (CLIP text encoder)—using a confidence-gated dynamic weighting mechanism to guide a lightweight student model in learning robust prompts. Training introduces only a distillation loss; inference incurs zero additional overhead, requiring neither data augmentation nor architectural modifications. Contribution/Results: The framework significantly outperforms state-of-the-art defenses under diverse white-box and black-box attacks, while maintaining superior clean-data classification accuracy—demonstrating strong robustness and generalization across attack scenarios.

Technology Category

Application Category

📝 Abstract
Adversarial attacks pose a significant threat to learning-based 3D point cloud models, critically undermining their reliability in security-sensitive applications. Existing defense methods often suffer from (1) high computational overhead and (2) poor generalization ability across diverse attack types. To bridge these gaps, we propose a novel yet efficient teacher-student framework, namely Multimodal Robust Prompt Distillation (MRPD) for distilling robust 3D point cloud model. It learns lightweight prompts by aligning student point cloud model's features with robust embeddings from three distinct teachers: a vision model processing depth projections, a high-performance 3D model, and a text encoder. To ensure a reliable knowledge transfer, this distillation is guided by a confidence-gated mechanism which dynamically balances the contribution of all input modalities. Notably, since the distillation is all during the training stage, there is no additional computational cost at inference. Extensive experiments demonstrate that MRPD substantially outperforms state-of-the-art defense methods against a wide range of white-box and black-box attacks, while even achieving better performance on clean data. Our work presents a new, practical paradigm for building robust 3D vision systems by efficiently harnessing multimodal knowledge.
Problem

Research questions and friction points this paper is trying to address.

Defending 3D point cloud models against adversarial attacks
Reducing computational overhead of existing defense methods
Improving generalization across diverse attack types
Innovation

Methods, ideas, or system contributions that make the work stand out.

Teacher-student framework distills robust 3D point cloud model
Aligns student features with multimodal teacher embeddings
Confidence-gated mechanism balances multimodal knowledge transfer
🔎 Similar Papers
No similar papers found.
Xiang Gu
Xiang Gu
Xi'an Jiaotong University
transfer learningoptimal transportgenerative models
L
Liming Lu
Nanjing University of Science and Technology
X
Xu Zheng
The Hong Kong University of Science and Technology (Guangzhou)
A
Anan Du
Nanjing University of Industry Technology
Y
Yongbin Zhou
Nanjing University of Science and Technology
Shuchao Pang
Shuchao Pang
University of New South Wales
Medical image analysisdeep learning