Whispers in the Machine: Confidentiality in LLM-integrated Systems

📅 2024-02-10

🏛️ arXiv.org

📈 Citations: 23

✨ Influential: 1

career value

216K/year

🤖 AI Summary

This work addresses a novel security risk in LLM-based agent systems: sensitive data leakage during inference caused by external tool integration. We formally define *inference-time data confidentiality vulnerabilities* in tool-augmented LLMs—the first systematic characterization of such threats. Methodologically, we propose the first confidentiality threat assessment framework for this setting, integrating threat modeling, adversarial tool integration testing, and quantitative sensitive information tracing. Our analysis uncovers two previously unrecognized attack vectors. Empirical evaluation across major open- and closed-source LLMs—under diverse tool configurations—demonstrates significant confidentiality degradation; certain attacks recover up to 100% of sensitive contextual information never explicitly referenced in prompts. This work establishes foundational theory and a reproducible, benchmarkable assessment methodology for secure and trustworthy deployment of LLM agents.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) are increasingly augmented with external tools and commercial services into LLM-integrated systems. While these interfaces can significantly enhance the capabilities of the models, they also introduce a new attack surface. Manipulated integrations, for example, can exploit the model and compromise sensitive data accessed through other interfaces. While previous work primarily focused on attacks targeting a model's alignment or the leakage of training data, the security of data that is only available during inference has escaped scrutiny so far. In this work, we demonstrate the vulnerabilities associated with external components and introduce a systematic approach to evaluate confidentiality risks in LLM-integrated systems. We identify two specific attack scenarios unique to these systems and formalize these into a tool-robustness framework designed to measure a model's ability to protect sensitive information. Our findings show that all examined models are highly vulnerable to confidentiality attacks, with the risk increasing significantly when models are used together with external tools.

Problem

Research questions and friction points this paper is trying to address.

Assessing confidentiality risks in LLM-based agentic systems

Identifying attack scenarios unique to external tool integrations

Evaluating model vulnerabilities in protecting sensitive information

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based agents for user interactions

Tool-robustness framework for confidentiality

Systematic evaluation of attack scenarios

🔎 Similar Papers

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions