Cross-Modal Transferable Image-to-Video Attack on Video Quality Metrics

πŸ“… 2025-01-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Video Quality Assessment (VQA) models are vulnerable to adversarial attacks in black-box settings; however, existing methods suffer from poor cross-modal transferability and low attack efficiency. To address this, we propose IC2VQAβ€”the first cross-modal adversarial attack framework that transfers white-box image-domain attacks to black-box VQA models. IC2VQA jointly leverages an Image Quality Assessment Model (IQAM) and CLIP to generate semantically consistent, transferable perturbations via gradient migration and cross-modal feature alignment, thereby misleading video quality scores. Evaluated on three mainstream black-box VQA models, IC2VQA achieves an attack success rate exceeding 86%. Under identical perturbation budgets and iteration counts, it significantly outperforms state-of-the-art black-box attack methods. This work exposes critical vulnerabilities in current VQA systems and establishes a new benchmark for cross-modal adversarial robustness evaluation.

Technology Category

Application Category

πŸ“ Abstract
Recent studies have revealed that modern image and video quality assessment (IQA/VQA) metrics are vulnerable to adversarial attacks. An attacker can manipulate a video through preprocessing to artificially increase its quality score according to a certain metric, despite no actual improvement in visual quality. Most of the attacks studied in the literature are white-box attacks, while black-box attacks in the context of VQA have received less attention. Moreover, some research indicates a lack of transferability of adversarial examples generated for one model to another when applied to VQA. In this paper, we propose a cross-modal attack method, IC2VQA, aimed at exploring the vulnerabilities of modern VQA models. This approach is motivated by the observation that the low-level feature spaces of images and videos are similar. We investigate the transferability of adversarial perturbations across different modalities; specifically, we analyze how adversarial perturbations generated on a white-box IQA model with an additional CLIP module can effectively target a VQA model. The addition of the CLIP module serves as a valuable aid in increasing transferability, as the CLIP model is known for its effective capture of low-level semantics. Extensive experiments demonstrate that IC2VQA achieves a high success rate in attacking three black-box VQA models. We compare our method with existing black-box attack strategies, highlighting its superiority in terms of attack success within the same number of iterations and levels of attack strength. We believe that the proposed method will contribute to the deeper analysis of robust VQA metrics.
Problem

Research questions and friction points this paper is trying to address.

Video Quality Evaluation
Adversarial Attacks
System Vulnerability
Innovation

Methods, ideas, or system contributions that make the work stand out.

IC2VQA
Adversarial Attacks
Video Quality Assessment
πŸ”Ž Similar Papers
No similar papers found.
G
Georgii Gotin
Lomonosov Moscow State University, Moscow, Russia
E
Ekaterina Shumitskaya
Lomonosov Moscow State University, Moscow, Russia; ISP RAS Research Center for Trusted Artificial Intelligence, Moscow, Russia; MSU Institute for Artificial Intelligence, Moscow, Russia
Anastasia Antsiferova
Anastasia Antsiferova
MSU AI Institute, ISP RAS, Innopolis University
machine learningcomputer visionvideo compressionadversarial robustness
Dmitriy Vatolin
Dmitriy Vatolin
Lomonosov Moscow State University
Video CompressionVideo ProcessingStereo Processing3D VideoVideo Quality