EigenShield: Causal Subspace Filtering via Random Matrix Theory for Adversarially Robust Vision-Language Models

📅 2025-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision-language models (VLMs) suffer from poor robustness against adversarial attacks, while existing defenses incur high computational overhead, exhibit strong architectural dependencies, and fail against adaptive attacks. Method: This paper proposes a training-free, inference-time defense framework. It pioneers the use of random matrix theory and the spiked covariance model to characterize spectral structures of VLM representations, enabling separation of causal from correlational feature subspaces. A robustness-driven inconsistency score (RbNS) and quantile-based dynamic thresholding mechanism are then introduced for efficient adversarial sample filtering. Contribution/Results: The work establishes spectral analysis as a principled pathway to adversarial robustness—achieving architecture-agnosticism and attack-agnosticism. Extensive experiments demonstrate that our method significantly reduces attack success rates across diverse VLM architectures and mainstream adversarial benchmarks, consistently outperforming state-of-the-art defenses including adversarial training, UNIGUARD, and CIDER.

Technology Category

Application Category

📝 Abstract
Vision-Language Models (VLMs) inherit adversarial vulnerabilities of Large Language Models (LLMs), which are further exacerbated by their multimodal nature. Existing defenses, including adversarial training, input transformations, and heuristic detection, are computationally expensive, architecture-dependent, and fragile against adaptive attacks. We introduce EigenShield, an inference-time defense leveraging Random Matrix Theory to quantify adversarial disruptions in high-dimensional VLM representations. Unlike prior methods that rely on empirical heuristics, EigenShield employs the spiked covariance model to detect structured spectral deviations. Using a Robustness-based Nonconformity Score (RbNS) and quantile-based thresholding, it separates causal eigenvectors, which encode semantic information, from correlational eigenvectors that are susceptible to adversarial artifacts. By projecting embeddings onto the causal subspace, EigenShield filters adversarial noise without modifying model parameters or requiring adversarial training. This architecture-independent, attack-agnostic approach significantly reduces the attack success rate, establishing spectral analysis as a principled alternative to conventional defenses. Our results demonstrate that EigenShield consistently outperforms all existing defenses, including adversarial training, UNIGUARD, and CIDER.
Problem

Research questions and friction points this paper is trying to address.

Mitigates adversarial vulnerabilities in Vision-Language Models
Filters adversarial noise via causal subspace projection
Reduces attack success rate without model modification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages Random Matrix Theory
Uses Robustness-based Nonconformity Score
Filters adversarial noise via causal subspace
🔎 Similar Papers
No similar papers found.