Mind the Gap: Detecting Black-box Adversarial Attacks in the Making through Query Update Analysis

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Real-time detection of query-based black-box adversarial attacks remains challenging due to lack of model access and dynamic attack behaviors. Method: This paper proposes a model-agnostic dynamic detection framework that leverages evolutionary similarity patterns during iterative query-sample updates. We introduce Delta Similarity (DS), a novel metric quantifying input-space similarity shifts across successive queries, enabling adversarial behavior modeling in a dynamically evolving similarity space—departing from conventional static-input detection paradigms. The method integrates query-sequence modeling, dynamic update-pattern analysis, and unified supervised/unsupervised anomaly detection to enable *pre-generation* awareness of adversarial perturbations. Results: Evaluated against eight state-of-the-art attacks—including adaptive variants designed to bypass existing defenses—our approach achieves significantly higher sensitivity and specificity than prior art. It demonstrates strong robustness across diverse models and threat settings, with low computational overhead suitable for practical deployment.

Technology Category

Application Category

📝 Abstract

Adversarial attacks remain a significant threat that can jeopardize the integrity of Machine Learning (ML) models. In particular, query-based black-box attacks can generate malicious noise without having access to the victim model's architecture, making them practical in real-world contexts. The community has proposed several defenses against adversarial attacks, only to be broken by more advanced and adaptive attack strategies. In this paper, we propose a framework that detects if an adversarial noise instance is being generated. Unlike existing stateful defenses that detect adversarial noise generation by monitoring the input space, our approach learns adversarial patterns in the input update similarity space. In fact, we propose to observe a new metric called Delta Similarity (DS), which we show it captures more efficiently the adversarial behavior. We evaluate our approach against 8 state-of-the-art attacks, including adaptive attacks, where the adversary is aware of the defense and tries to evade detection. We find that our approach is significantly more robust than existing defenses both in terms of specificity and sensitivity.

Problem

Research questions and friction points this paper is trying to address.

Detects adversarial noise generation in black-box attacks.

Proposes Delta Similarity metric for efficient adversarial behavior capture.

Evaluates robustness against adaptive attacks with improved specificity and sensitivity.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects adversarial noise via query update analysis

Introduces Delta Similarity metric for adversarial behavior

Robust against adaptive and state-of-the-art attacks

🔎 Similar Papers

Unmasking Database Vulnerabilities: Zero-Knowledge Schema Inference Attacks in Text-to-SQL Systems