🤖 AI Summary
Prompt injection attacks pose an increasingly severe security threat to large language model (LLM) integrated applications, yet existing black-box attack methods suffer from limited practical efficacy. Method: This paper proposes HouYi—the first real-world-oriented, three-stage black-box prompt injection framework comprising pre-prompt injection, context-aware segmentation, and malicious payload delivery. HouYi uniquely enables automated triggering of high-impact consequences—including arbitrary LLM misuse and application-level prompt stealing—via black-box fuzzing, context-aware prompt engineering, and web-injection-inspired modeling. Contribution/Results: Evaluated through real-world penetration testing across 36 mainstream LLM applications, HouYi uncovered 31 critical vulnerabilities, independently confirmed by ten vendors—including Notion—with impact on millions of users. The work significantly advances LLM security practice by bridging the gap between theoretical attack models and deployable, scalable exploitation techniques.
📝 Abstract
Large Language Models (LLMs), renowned for their superior proficiency in language comprehension and generation, stimulate a vibrant ecosystem of applications around them. However, their extensive assimilation into various services introduces significant security risks. This study deconstructs the complexities and implications of prompt injection attacks on actual LLM-integrated applications. Initially, we conduct an exploratory analysis on ten commercial applications, highlighting the constraints of current attack strategies in practice. Prompted by these limitations, we subsequently formulate HouYi, a novel black-box prompt injection attack technique, which draws inspiration from traditional web injection attacks. HouYi is compartmentalized into three crucial elements: a seamlessly-incorporated pre-constructed prompt, an injection prompt inducing context partition, and a malicious payload designed to fulfill the attack objectives. Leveraging HouYi, we unveil previously unknown and severe attack outcomes, such as unrestricted arbitrary LLM usage and uncomplicated application prompt theft. We deploy HouYi on 36 actual LLM-integrated applications and discern 31 applications susceptible to prompt injection. 10 vendors have validated our discoveries, including Notion, which has the potential to impact millions of users. Our investigation illuminates both the possible risks of prompt injection attacks and the possible tactics for mitigation.