AuDirector: A Self-Reflective Closed-Loop Framework for Immersive Audio Storytelling

📅 2026-05-12
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
This work addresses key limitations in existing long-form audio narrative systems—namely, inconsistencies between character personas and vocal delivery, lack of self-correction capabilities, and insufficient human–AI interaction. To overcome these challenges, the authors propose a self-reflective, closed-loop multi-agent framework that integrates identity-aware voice adaptation, collaborative synthesis with automatic error correction, and interactive script refinement driven by natural language feedback. By unifying emotion-instructed speech synthesis, audio flaw detection with regeneration, and multi-agent coordination, the approach achieves significantly improved performance over state-of-the-art baselines in narrative coherence, emotional expressiveness, and acoustic fidelity.
📝 Abstract
Despite advances in text and visual generation, creating coherent long-form audio narratives remains challenging. Existing frameworks often exhibit limitations such as mismatched character settings with voice performance, insufficient self-correction mechanisms, and limited human interactivity. To address these challenges, we propose AuDirector, a self-reflective closed-loop multi-agent framework. Specifically, it involves an Identity-Aware Pre-production mechanism that transforms narrative texts into character profiles and utterance-level emotional instructions to retrieve suitable voice candidates and guide expressive speech synthesis, thereby promoting context-aligned voice adaptation. To enhance quality, a Collaborative Synthesis and Correction module introduces a closed-loop self-correction mechanism to systematically audit and regenerate defective audio components. Furthermore, a Human-Guided Interactive Refinement module facilitates user control by interpreting natural language feedback to interactively refine the underlying scripts. Experiments demonstrate that AuDirector achieves superior performance compared to state-of-the-art baselines in structural coherence, emotional expressiveness, and acoustic fidelity. Audio samples can be found at https://anonymous-itsh.github.io/.
Problem

Research questions and friction points this paper is trying to address.

audio storytelling
voice-character mismatch
self-correction
human interactivity
long-form narrative
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-reflective framework
identity-aware voice synthesis
closed-loop correction
interactive audio storytelling
emotional expressiveness
🔎 Similar Papers
No similar papers found.