SNEAK: Evaluating Strategic Communication and Information Leakage in Large Language Models

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of strategic communication in multi-agent settings, where large language models must convey useful information while avoiding disclosure of sensitive content—a capability inadequately assessed by existing benchmarks under asymmetric information. The authors propose SNEAK, the first evaluation framework specifically designed for strategic linguistic communication. It operationalizes this task by assigning each model a secret word, a semantic category, and a candidate word set, requiring it to generate messages that implicitly hint at the secret without explicit revelation. The framework introduces dual metrics—utility and leakage—and incorporates role-based simulations involving informed allies and uninformed impostors to systematically quantify the trade-off between effective information sharing and confidentiality. Experiments reveal that current models perform poorly on this task, with human participants achieving up to four times higher scores, underscoring strategic communication as a significant open challenge.
📝 Abstract
Large language models (LLMs) are increasingly deployed in multi-agent settings where communication must balance informativeness and secrecy. In such settings, an agent may need to signal information to collaborators while preventing an adversary from inferring sensitive details. However, existing LLM benchmarks primarily evaluate capabilities such as reasoning, factual knowledge, or instruction following, and do not directly measure strategic communication under asymmetric information. We introduce SNEAK (Secret-aware Natural language Evaluation for Adversarial Knowledge), a benchmark for evaluating selective information sharing in language models. In SNEAK, a model is given a semantic category, a candidate set of words, and a secret word, and must generate a message that indicates knowledge of the secret without revealing it too clearly. We evaluate generated messages using two simulated agents with different information states: an ally, who knows the secret and must identify the intended message, and a chameleon, who does not know the secret and attempts to infer it from the message. This yields two complementary metrics: utility, measuring how well the message communicates to collaborators, and leakage, measuring how much information it reveals to an adversary. Using this framework, we analyze the trade-off between informativeness and secrecy in modern language models and show that strategic communication under asymmetric information remains a challenging capability for current systems. Notably, human participants outperform all evaluated models by a large margin, achieving up to four times higher scores.
Problem

Research questions and friction points this paper is trying to address.

strategic communication
information leakage
large language models
asymmetric information
selective information sharing
Innovation

Methods, ideas, or system contributions that make the work stand out.

strategic communication
information leakage
asymmetric information
adversarial evaluation
selective information sharing
🔎 Similar Papers
No similar papers found.