First Steps Towards Overhearing LLM Agents: A Case Study With Dungeons&Dragons Gameplay

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing conversational AI agents typically intervene directly in human dialogue, limiting their applicability in collaborative, narrative-rich settings like tabletop role-playing games (e.g., Dungeons & Dragons), where unobtrusive support for the Dungeon Master (DM) is essential. Method: This paper introduces “overhearing agents”—large multimodal audio-language models that passively listen to live voice interactions without participating, inferring contextual semantics from acoustic cues (e.g., prosody, pauses, speaker turn patterns) and dynamically supporting the DM via background assistance. We propose an end-to-end system integrating automatic speech recognition, semantic grounding, and game-state modeling. Contribution/Results: We present the first empirical demonstration that large models can perform zero-shot, instruction-free overhearing by leveraging paralinguistic features. Human evaluation shows significant improvements in DM decision-making efficiency and narrative coherence. The paper open-sources the full implementation and a Python library, establishing foundational infrastructure for non-intrusive, context-aware intelligent agents.

Technology Category

Application Category

📝 Abstract
Much work has been done on conversational LLM agents which directly assist human users with tasks. We present an alternative paradigm for interacting with LLM agents, which we call"overhearing agents". These overhearing agents do not actively participate in conversation -- instead, they"listen in"on human-to-human conversations and perform background tasks or provide suggestions to assist the user. In this work, we explore the overhearing agents paradigm through the lens of Dungeons&Dragons gameplay. We present an in-depth study using large multimodal audio-language models as overhearing agents to assist a Dungeon Master. We perform a human evaluation to examine the helpfulness of such agents and find that some large audio-language models have the emergent ability to perform overhearing agent tasks using implicit audio cues. Finally, we release Python libraries and our project code to support further research into the overhearing agents paradigm at https://github.com/zhudotexe/overhearing_agents.
Problem

Research questions and friction points this paper is trying to address.

Exploring overhearing LLM agents for background task assistance
Evaluating audio-language models in Dungeons & Dragons gameplay
Developing tools to support overhearing agent paradigm research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Overhearing agents listen to human conversations
Multimodal audio-language models assist Dungeon Master
Python libraries released for further research
🔎 Similar Papers
No similar papers found.
Andrew Zhu
Andrew Zhu
PhD Student, University of Pennsylvania
Natural Language Processing
E
Evan Osgood
University of Pennsylvania
C
Christopher Callison-Burch
University of Pennsylvania