Reproducing HotFlip for Corpus Poisoning Attacks in Dense Retrieval

📅 2025-01-08

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Existing HotFlip-based corpus poisoning attacks against dense retrieval systems suffer from low computational efficiency and strong dependence on user queries. Method: This paper proposes an efficient gradient-driven, word-level adversarial paragraph generation framework. It introduces three key techniques: (i) batched query–passage parallel computation, (ii) cross-model transferable attack design, and (iii) query-agnostic perturbation injection. Contribution/Results: The framework reduces per-document attack time from 4 hours to 15 minutes. It presents the first systematic evaluation of two novel attack settings—transfer-based black-box attacks and query-agnostic attacks—across multiple dense retrievers. Results show that attack success rate decreases with model sophistication, while query-agnostic attack performance scales positively with perturbation magnitude. This work significantly enhances the practicality and generalizability of corpus poisoning attacks, establishing a new benchmark and analytical perspective for retrieval robustness research.

Technology Category

Application Category

📝 Abstract

HotFlip is a topical gradient-based word substitution method for attacking language models. Recently, this method has been further applied to attack retrieval systems by generating malicious passages that are injected into a corpus, i.e., corpus poisoning. However, HotFlip is known to be computationally inefficient, with the majority of time being spent on gradient accumulation for each query-passage pair during the adversarial token generation phase, making it impossible to generate an adequate number of adversarial passages in a reasonable amount of time. Moreover, the attack method itself assumes access to a set of user queries, a strong assumption that does not correspond to how real-world adversarial attacks are usually performed. In this paper, we first significantly boost the efficiency of HotFlip, reducing the adversarial generation process from 4 hours per document to only 15 minutes, using the same hardware. We further contribute experiments and analysis on two additional tasks: (1) transfer-based black-box attacks, and (2) query-agnostic attacks. Whenever possible, we provide comparisons between the original method and our improved version. Our experiments demonstrate that HotFlip can effectively attack a variety of dense retrievers, with an observed trend that its attack performance diminishes against more advanced and recent methods. Interestingly, we observe that while HotFlip performs poorly in a black-box setting, indicating limited capacity for generalization, in query-agnostic scenarios its performance is correlated to the volume of injected adversarial passages.

Problem

Research questions and friction points this paper is trying to address.

HotFlip Method

Adversarial Attacks

Efficiency and Adaptability

Innovation

Methods, ideas, or system contributions that make the work stand out.

HotFlip Enhancement

Black-box Attacks

Query-independent Attacks

🔎 Similar Papers

Whispers in Grammars: Injecting Covert Backdoors to Compromise Dense Retrieval Systems