🤖 AI Summary
This work proposes a query-aware minimal adversarial attack method that exploits the sensitivity of neural text ranking models to single-word perturbations. By inserting or replacing a single “query-centric” term in a document—carefully chosen to align semantically with the query—the approach significantly elevates the target document’s rank with minimal modification. Combining heuristic and gradient-guided strategies, the method enables efficient white-box attacks on BERT- and monoT5-based re-rankers. Experiments on TREC-DL 2019/2020 show a success rate of up to 91%, with fewer than two word edits per document on average, achieving ranking improvements comparable to or better than PRADA with substantially fewer edits. The study further identifies a “Goldilocks zone” where mid-ranked documents are most vulnerable to attack and introduces a novel metric to assess model sensitivity to such adversarial perturbations.
📝 Abstract
Neural ranking models (NRMs) achieve strong retrieval effectiveness, yet prior work has shown they are vulnerable to adversarial perturbations. We revisit this robustness question with a minimal, query-aware attack that promotes a target document by inserting or substituting a single, semantically aligned word - the query center. We study heuristic and gradient-guided variants, including a white-box method that identifies influential insertion points. On TREC-DL 2019/2020 with BERT and monoT5 re-rankers, our single-word attacks achieve up to 91% success while modifying fewer than two tokens per document on average, achieving competitive rank and score boosts with far fewer edits under a comparable white-box setup to ensure fair evaluation against PRADA. We also introduce new diagnostic metrics to analyze attack sensitivity beyond aggregate success rates. Our analysis reveals a Goldilocks zone in which mid-ranked documents are most vulnerable. These findings demonstrate practical risks and motivate future defenses for robust neural ranking.