MUSE: A Multi-agent Framework for Unconstrained Story Envisioning via Closed-Loop Cognitive Orchestration

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

This work addresses narrative defocus in long-form audiovisual story generation caused by semantic drift and character inconsistency by proposing the first multi-agent storytelling framework based on closed-loop cognitive coordination. The approach formulates story generation as a constraint satisfaction problem, employing an iterative planning–execution–verification–correction mechanism that integrates explicit machine-executable controls—such as identity preservation, spatial composition, and temporal continuity—with cross-modal feedback to maintain high-level narrative intent over extended durations. Additionally, the study introduces MUSEBench, a reference-free, open-ended evaluation protocol. Experimental results demonstrate that the proposed method significantly outperforms existing baselines in long-horizon narrative coherence, cross-modal identity consistency, and cinematic visual quality.

Technology Category

Application Category

📝 Abstract

Generating long-form audio-visual stories from a short user prompt remains challenging due to an intent-execution gap, where high-level narrative intent must be preserved across coherent, shot-level multimodal generation over long horizons. Existing approaches typically rely on feed-forward pipelines or prompt-only refinement, which often leads to semantic drift and identity inconsistency as sequences grow longer. We address this challenge by formulating storytelling as a closed-loop constraint enforcement problem and propose MUSE, a multi-agent framework that coordinates generation through an iterative plan-execute-verify-revise loop. MUSE translates narrative intent into explicit, machine-executable controls over identity, spatial composition, and temporal continuity, and applies targeted multimodal feedback to correct violations during generation. To evaluate open-ended storytelling without ground-truth references, we introduce MUSEBench, a reference-free evaluation protocol validated by human judgments. Experiments demonstrate that MUSE substantially improves long-horizon narrative coherence, cross-modal identity consistency, and cinematic quality compared with representative baselines.

Problem

Research questions and friction points this paper is trying to address.

story generation

narrative coherence

identity consistency

multimodal generation

long-horizon storytelling

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent framework

closed-loop orchestration

long-horizon storytelling