๐ค AI Summary
This work addresses the challenge of modeling AI persuasion in realistic scenarios involving a mix of verifiable and unverifiable information. The authors propose MixTalk, a novel game-theoretic framework that formulates strategic communication by integrating both types of information: a sender LLM strategically combines statements to convey private information, while a receiver LLM infers the true state under a limited verification budget. To derive robust strategies, they introduce Tournament Oracle Policy Distillation (TOPD), which distills effective verification and inference policies from multi-agent interaction logs. Large-scale tournament experiments reveal significant deficiencies in current LLMsโ ability to perform credibility-aware reasoning, whereas TOPD substantially enhances the receiverโs robustness against persuasive manipulation.
๐ Abstract
Agents powered by large language models (LLMs) are increasingly deployed in settings where communication shapes high-stakes decisions, making a principled understanding of strategic communication essential. Prior work largely studies either unverifiable cheap-talk or fully verifiable disclosure, failing to capture realistic domains in which information has probabilistic credibility. We introduce MixTalk, a strategic communication game for LLM-to-LLM interaction that models information credibility. In MixTalk, a sender agent strategically combines verifiable and unverifiable claims to communicate private information, while a receiver agent allocates a limited budget to costly verification and infers the underlying state from prior beliefs, claims, and verification outcomes. We evaluate state-of-the-art LLM agents in large-scale tournaments across three realistic deployment settings, revealing their strengths and limitations in reasoning about information credibility and the explicit behavior that shapes these interactions. Finally, we propose Tournament Oracle Policy Distillation (TOPD), an offline method that distills tournament oracle policy from interaction logs and deploys it in-context at inference time. Our results show that TOPD significantly improves receiver robustness to persuasion.