🤖 AI Summary
This work addresses a novel security risk in LLM-based agent systems: sensitive data leakage during inference caused by external tool integration. We formally define *inference-time data confidentiality vulnerabilities* in tool-augmented LLMs—the first systematic characterization of such threats. Methodologically, we propose the first confidentiality threat assessment framework for this setting, integrating threat modeling, adversarial tool integration testing, and quantitative sensitive information tracing. Our analysis uncovers two previously unrecognized attack vectors. Empirical evaluation across major open- and closed-source LLMs—under diverse tool configurations—demonstrates significant confidentiality degradation; certain attacks recover up to 100% of sensitive contextual information never explicitly referenced in prompts. This work establishes foundational theory and a reproducible, benchmarkable assessment methodology for secure and trustworthy deployment of LLM agents.
📝 Abstract
Large Language Models (LLMs) are increasingly augmented with external tools and commercial services into LLM-integrated systems. While these interfaces can significantly enhance the capabilities of the models, they also introduce a new attack surface. Manipulated integrations, for example, can exploit the model and compromise sensitive data accessed through other interfaces. While previous work primarily focused on attacks targeting a model's alignment or the leakage of training data, the security of data that is only available during inference has escaped scrutiny so far. In this work, we demonstrate the vulnerabilities associated with external components and introduce a systematic approach to evaluate confidentiality risks in LLM-integrated systems. We identify two specific attack scenarios unique to these systems and formalize these into a tool-robustness framework designed to measure a model's ability to protect sensitive information. Our findings show that all examined models are highly vulnerable to confidentiality attacks, with the risk increasing significantly when models are used together with external tools.