🤖 AI Summary
This work systematically uncovers, for the first time, the root causes and security implications of remote code execution (RCE) vulnerabilities in LLM-integrated applications (e.g., LangChain) induced by prompt injection attacks. We propose a lightweight static analysis technique to precisely identify framework-level RCE call chains, and design a prompt-driven, multi-stage validation and exploitation framework that comprehensively covers output hijacking, vulnerability triggering, and composite exploitation. Our approach detects 20 high-severity vulnerabilities—including 19 RCEs and one arbitrary file read/write—across 51 real-world applications; 17 were confirmed by vendors and 11 assigned CVE identifiers. We successfully demonstrated 17 practical exploits (16 RCEs and one SQL injection). This study fills a critical gap in the systematic security analysis of RCE in LLM applications, providing both theoretical foundations and practical tools for framework hardening and defense.
📝 Abstract
LLMs show promise in transforming software development, with a growing interest in integrating them into more intelligent apps. Frameworks like LangChain aid LLM-integrated app development, offering code execution utility/APIs for custom actions. However, these capabilities theoretically introduce Remote Code Execution (RCE) vulnerabilities, enabling remote code execution through prompt injections. No prior research systematically investigates these frameworks' RCE vulnerabilities or their impact on applications and exploitation consequences. Therefore, there is a huge research gap in this field. In this study, we propose LLMSmith to detect, validate and exploit the RCE vulnerabilities in LLM-integrated frameworks and apps. To achieve this goal, we develop two novel techniques, including 1) a lightweight static analysis to examine LLM integration mechanisms, and construct call chains to identify RCE vulnerabilities in frameworks; 2) a systematical prompt-based exploitation method to verify and exploit the found vulnerabilities in LLM-integrated apps. This technique involves various strategies to control LLM outputs, trigger RCE vulnerabilities and launch subsequent attacks. Our research has uncovered a total of 20 vulnerabilities in 11 LLM-integrated frameworks, comprising 19 RCE vulnerabilities and 1 arbitrary file read/write vulnerability. Of these, 17 have been confirmed by the framework developers, with 11 vulnerabilities being assigned CVE IDs. For the 51 apps potentially affected by RCE, we successfully executed attacks on 17 apps, 16 of which are vulnerable to RCE and 1 to SQL injection. Furthermore, we conduct a comprehensive analysis of these vulnerabilities and construct practical attacks to demonstrate the hazards in reality. Last, we propose several mitigation measures for both framework and app developers to counteract such attacks.