AuDirector: A Self-Reflective Closed-Loop Framework for Immersive Audio Storytelling

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses key limitations in existing long-form audio narrative systems—namely, inconsistencies between character personas and vocal delivery, lack of self-correction capabilities, and insufficient human–AI interaction. To overcome these challenges, the authors propose a self-reflective, closed-loop multi-agent framework that integrates identity-aware voice adaptation, collaborative synthesis with automatic error correction, and interactive script refinement driven by natural language feedback. By unifying emotion-instructed speech synthesis, audio flaw detection with regeneration, and multi-agent coordination, the approach achieves significantly improved performance over state-of-the-art baselines in narrative coherence, emotional expressiveness, and acoustic fidelity.

📝 Abstract

Despite advances in text and visual generation, creating coherent long-form audio narratives remains challenging. Existing frameworks often exhibit limitations such as mismatched character settings with voice performance, insufficient self-correction mechanisms, and limited human interactivity. To address these challenges, we propose AuDirector, a self-reflective closed-loop multi-agent framework. Specifically, it involves an Identity-Aware Pre-production mechanism that transforms narrative texts into character profiles and utterance-level emotional instructions to retrieve suitable voice candidates and guide expressive speech synthesis, thereby promoting context-aligned voice adaptation. To enhance quality, a Collaborative Synthesis and Correction module introduces a closed-loop self-correction mechanism to systematically audit and regenerate defective audio components. Furthermore, a Human-Guided Interactive Refinement module facilitates user control by interpreting natural language feedback to interactively refine the underlying scripts. Experiments demonstrate that AuDirector achieves superior performance compared to state-of-the-art baselines in structural coherence, emotional expressiveness, and acoustic fidelity. Audio samples can be found at https://anonymous-itsh.github.io/.

Problem

Research questions and friction points this paper is trying to address.

audio storytelling

voice-character mismatch

self-correction

human interactivity

long-form narrative

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-reflective framework

identity-aware voice synthesis

closed-loop correction