CompanionCast: A Multi-Agent Conversational AI Framework with Spatial Audio for Social Co-Viewing Experiences

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

To address the growing isolation in modern media consumption and the lack of shared social presence during co-viewing, this paper proposes a multi-agent conversational AI framework enabling real-time, content-aware collaborative viewing. Methodologically, it integrates multimodal video understanding, role-aware large language model (LLM) agents, and spatial-audio-driven speech synthesis, and introduces a novel “LLM-as-a-Judge” five-dimensional evaluation module to establish an evaluation-generation closed-loop optimization pipeline. Key contributions include: (1) the first generalizable multi-agent co-viewing orchestration framework; and (2) a dimensionally reflective dialogue quality refinement mechanism. Empirical evaluation in football match viewing demonstrates significant improvement in users’ sense of social presence. Furthermore, the framework exhibits strong cross-domain applicability and scalability, validated across educational and entertainment scenarios.

Technology Category

Application Category

📝 Abstract

Social presence is central to the enjoyment of watching content together, yet modern media consumption is increasingly solitary. We investigate whether multi-agent conversational AI systems can recreate the dynamics of shared viewing experiences across diverse content types. We present CompanionCast, a general framework for orchestrating multiple role-specialized AI agents that respond to video content using multimodal inputs, speech synthesis, and spatial audio. Distinctly, CompanionCast integrates an LLM-as-a-Judge module that iteratively scores and refines conversations across five dimensions (relevance, authenticity, engagement, diversity, personality consistency). We validate this framework through sports viewing, a domain with rich dynamics and strong social traditions, where a pilot study with soccer fans suggests that multi-agent interaction improves perceived social presence compared to solo viewing. We contribute: (1) a generalizable framework for orchestrating multi-agent conversations around multimodal video content, (2) a novel evaluator-agent pipeline for conversation quality control, and (3) exploratory evidence of increased social presence in AI-mediated co-viewing. We discuss challenges and future directions for applying this approach to diverse viewing contexts including entertainment, education, and collaborative watching experiences.

Problem

Research questions and friction points this paper is trying to address.

Enhancing social presence in solitary media consumption

Orchestrating multi-agent AI conversations for shared viewing

Evaluating and refining AI-mediated co-viewing experiences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent conversational AI framework with spatial audio

LLM-as-a-Judge module for iterative conversation scoring

Multimodal input integration for video content response

🔎 Similar Papers

No similar papers found.