ZIUM: Zero-Shot Intent-Aware Adversarial Attack on Unlearned Models

📅 2025-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Adversarial prompts can bypass concept deletion and reconstruct forgotten content in machine unlearning (MU) models, posing serious security risks. Method: This paper proposes a zero-shot, intent-aware adversarial attack method that requires neither iterative optimization nor additional training; instead, it leverages an intent-driven prompt generation mechanism to flexibly specify attack targets and efficiently trigger forgotten concepts. Contribution/Results: To the best of our knowledge, this is the first zero-shot, intent-customizable attack against MU models, significantly enhancing both attack flexibility and efficiency. Extensive experiments across diverse unlearning scenarios demonstrate that the proposed method achieves higher attack success rates than state-of-the-art approaches while substantially reducing attack latency, thereby establishing a novel evaluation paradigm for assessing the security of MU models.

Technology Category

Application Category

📝 Abstract
Machine unlearning (MU) removes specific data points or concepts from deep learning models to enhance privacy and prevent sensitive content generation. Adversarial prompts can exploit unlearned models to generate content containing removed concepts, posing a significant security risk. However, existing adversarial attack methods still face challenges in generating content that aligns with an attacker's intent while incurring high computational costs to identify successful prompts. To address these challenges, we propose ZIUM, a Zero-shot Intent-aware adversarial attack on Unlearned Models, which enables the flexible customization of target attack images to reflect an attacker's intent. Additionally, ZIUM supports zero-shot adversarial attacks without requiring further optimization for previously attacked unlearned concepts. The evaluation across various MU scenarios demonstrated ZIUM's effectiveness in successfully customizing content based on user-intent prompts while achieving a superior attack success rate compared to existing methods. Moreover, its zero-shot adversarial attack significantly reduces the attack time for previously attacked unlearned concepts.
Problem

Research questions and friction points this paper is trying to address.

Exploiting unlearned models with adversarial prompts
High computational cost in generating intent-aligned content
Lack of zero-shot attacks for unlearned concepts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Zero-shot intent-aware adversarial attack customization
Flexible target attack image customization
Reduces attack time for unlearned concepts
🔎 Similar Papers
No similar papers found.
H
Hyun Jun Yook
Chung-Ang University
G
Ga San Jhun
Chung-Ang University
J
Jae Hyun Cho
Chung-Ang University
M
Min Jeon
Chung-Ang University
D
Donghyun Kim
Korea University
T
Tae Hyung Kim
Hongik University
Youn Kyu Lee
Youn Kyu Lee
Chung-Ang University
Software ArchitectureAISecurity