CompanionCast: A Multi-Agent Conversational AI Framework with Spatial Audio for Social Co-Viewing Experiences

πŸ“… 2025-12-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the growing isolation in modern media consumption and the lack of shared social presence during co-viewing, this paper proposes a multi-agent conversational AI framework enabling real-time, content-aware collaborative viewing. Methodologically, it integrates multimodal video understanding, role-aware large language model (LLM) agents, and spatial-audio-driven speech synthesis, and introduces a novel β€œLLM-as-a-Judge” five-dimensional evaluation module to establish an evaluation-generation closed-loop optimization pipeline. Key contributions include: (1) the first generalizable multi-agent co-viewing orchestration framework; and (2) a dimensionally reflective dialogue quality refinement mechanism. Empirical evaluation in football match viewing demonstrates significant improvement in users’ sense of social presence. Furthermore, the framework exhibits strong cross-domain applicability and scalability, validated across educational and entertainment scenarios.

Technology Category

Application Category

πŸ“ Abstract
Social presence is central to the enjoyment of watching content together, yet modern media consumption is increasingly solitary. We investigate whether multi-agent conversational AI systems can recreate the dynamics of shared viewing experiences across diverse content types. We present CompanionCast, a general framework for orchestrating multiple role-specialized AI agents that respond to video content using multimodal inputs, speech synthesis, and spatial audio. Distinctly, CompanionCast integrates an LLM-as-a-Judge module that iteratively scores and refines conversations across five dimensions (relevance, authenticity, engagement, diversity, personality consistency). We validate this framework through sports viewing, a domain with rich dynamics and strong social traditions, where a pilot study with soccer fans suggests that multi-agent interaction improves perceived social presence compared to solo viewing. We contribute: (1) a generalizable framework for orchestrating multi-agent conversations around multimodal video content, (2) a novel evaluator-agent pipeline for conversation quality control, and (3) exploratory evidence of increased social presence in AI-mediated co-viewing. We discuss challenges and future directions for applying this approach to diverse viewing contexts including entertainment, education, and collaborative watching experiences.
Problem

Research questions and friction points this paper is trying to address.

Enhancing social presence in solitary media consumption
Orchestrating multi-agent AI conversations for shared viewing
Evaluating and refining AI-mediated co-viewing experiences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent conversational AI framework with spatial audio
LLM-as-a-Judge module for iterative conversation scoring
Multimodal input integration for video content response
πŸ”Ž Similar Papers
No similar papers found.