🤖 AI Summary
This work addresses a fundamental security vulnerability in long-term memory-augmented large language models (LLMs), which rely on similarity-based retrieval mechanisms and are thus susceptible to black-box adversarial memory injection attacks. The study systematically uncovers this inherent flaw and introduces ER-MIA, a unified attack framework that formalizes two distinct threat scenarios—content-based and query-targeted attacks—and devises composable attack primitives with integrated strategies. Extensive experiments across multiple mainstream LLMs and memory architectures demonstrate high attack success rates, confirming the systemic and pervasive nature of this vulnerability across diverse models and memory system designs.
📝 Abstract
Large language models (LLMs) are increasingly augmented with long-term memory systems to overcome finite context windows and enable persistent reasoning across interactions. However, recent research finds that LLMs become more vulnerable because memory provides extra attack surfaces. In this paper, we present the first systematic study of black-box adversarial memory injection attacks that target the similarity-based retrieval mechanism in long-term memory-augmented LLMs. We introduce ER-MIA, a unified framework that exposes this vulnerability and formalizes two realistic attack settings: content-based attacks and question-targeted attacks. In these settings, ER-MIA includes an arsenal of composable attack primitives and ensemble attacks that achieve high success rates under minimal attacker assumptions. Extensive experiments across multiple LLMs and long-term memory systems demonstrate that similarity-based retrieval constitutes a fundamental and system-level vulnerability, revealing security risks that persist across memory designs and application scenarios.