🤖 AI Summary
This work addresses the limitations of existing diffusion-based video adversarial purification methods, which suffer from low sampling efficiency and distorted trajectories, hindering effective recovery of perturbed content. The authors propose a Masked Flow Matching (MFM) mechanism that disrupts global adversarial structures via physical masking and leverages Conditional Flow Matching (CFM) combined with inpainting objectives to reconstruct clean video dynamics. Additionally, a Frequency-Gated Loss (FGL) is introduced to disentangle semantic content from adversarial noise. Incorporating attack-awareness and a generalizable training paradigm, the method achieves robust accuracies of 87% and 89% against PGD and CW attacks on UCF-101 and HMDB-51, respectively, demonstrates strong performance against the adaptive DiffHammer attack, and attains a 98% zero-shot detection accuracy for PGD perturbations.
📝 Abstract
Video recognition models remain vulnerable to adversarial attacks, while existing diffusion-based purification methods suffer from inefficient sampling and curved trajectories. Directly regressing clean videos from adversarial inputs often fails to recover faithful content due to the subtle nature of perturbations; this necessitates physically shattering the adversarial structure. Therefore, we propose Flow Matching for Adversarial Video Purification FMVP. FMVP physically shatters global adversarial structures via a masking strategy and reconstructs clean video dynamics using Conditional Flow Matching (CFM) with an inpainting objective. To further decouple semantic content from adversarial noise, we design a Frequency-Gated Loss (FGL) that explicitly suppresses high-frequency adversarial residuals while preserving low-frequency fidelity. We design Attack-Aware and Generalist training paradigms to handle known and unknown threats, respectively. Extensive experiments on UCF-101 and HMDB-51 demonstrate that FMVP outperforms state-of-the-art methods (DiffPure, Defense Patterns (DP), Temporal Shuffling (TS) and FlowPure), achieving robust accuracy exceeding 87% against PGD and 89% against CW attacks. Furthermore, FMVP demonstrates superior robustness against adaptive attacks (DiffHammer) and functions as a zero-shot adversarial detector, attaining AUC-ROC scores of 0.98 for PGD and 0.79 for highly imperceptible CW attacks.