π€ AI Summary
To address the growing isolation in modern media consumption and the lack of shared social presence during co-viewing, this paper proposes a multi-agent conversational AI framework enabling real-time, content-aware collaborative viewing. Methodologically, it integrates multimodal video understanding, role-aware large language model (LLM) agents, and spatial-audio-driven speech synthesis, and introduces a novel βLLM-as-a-Judgeβ five-dimensional evaluation module to establish an evaluation-generation closed-loop optimization pipeline. Key contributions include: (1) the first generalizable multi-agent co-viewing orchestration framework; and (2) a dimensionally reflective dialogue quality refinement mechanism. Empirical evaluation in football match viewing demonstrates significant improvement in usersβ sense of social presence. Furthermore, the framework exhibits strong cross-domain applicability and scalability, validated across educational and entertainment scenarios.
π Abstract
Social presence is central to the enjoyment of watching content together, yet modern media consumption is increasingly solitary. We investigate whether multi-agent conversational AI systems can recreate the dynamics of shared viewing experiences across diverse content types. We present CompanionCast, a general framework for orchestrating multiple role-specialized AI agents that respond to video content using multimodal inputs, speech synthesis, and spatial audio. Distinctly, CompanionCast integrates an LLM-as-a-Judge module that iteratively scores and refines conversations across five dimensions (relevance, authenticity, engagement, diversity, personality consistency). We validate this framework through sports viewing, a domain with rich dynamics and strong social traditions, where a pilot study with soccer fans suggests that multi-agent interaction improves perceived social presence compared to solo viewing. We contribute: (1) a generalizable framework for orchestrating multi-agent conversations around multimodal video content, (2) a novel evaluator-agent pipeline for conversation quality control, and (3) exploratory evidence of increased social presence in AI-mediated co-viewing. We discuss challenges and future directions for applying this approach to diverse viewing contexts including entertainment, education, and collaborative watching experiences.