Hollywood Town: Long-Video Generation via Cross-Modal Multi-Agent Orchestration

📅 2025-10-25

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

To address challenges in long-video generation—including low multi-agent collaboration efficiency, contextual inconsistency, and high memory overhead—this paper proposes OmniAgent, a novel framework inspired by cinematic production workflows. Methodologically, it introduces a hierarchical hypergraph architecture enabling modular task decomposition and scalable agent coordination; a directed cyclic graph mechanism with bounded retries to support cross-modal context sharing, dynamic feedback, and iterative refinement; and an integrated approach combining context engineering with hierarchical control policies to minimize agent state maintenance costs. Experimental results demonstrate that OmniAgent significantly improves temporal coherence, spatial resolution, and semantic consistency in long-duration video generation. The framework establishes a scalable, robust paradigm for multi-agent collaborative generation in complex creative tasks, advancing the state of the art in structured, context-aware multimodal synthesis.

Technology Category

Application Category

📝 Abstract

Recent advancements in multi-agent systems have demonstrated significant potential for enhancing creative task performance, such as long video generation. This study introduces three innovations to improve multi-agent collaboration. First, we propose OmniAgent, a hierarchical, graph-based multi-agent framework for long video generation that leverages a film-production-inspired architecture to enable modular specialization and scalable inter-agent collaboration. Second, inspired by context engineering, we propose hypergraph nodes that enable temporary group discussions among agents lacking sufficient context, reducing individual memory requirements while ensuring adequate contextual information. Third, we transition from directed acyclic graphs (DAGs) to directed cyclic graphs with limited retries, allowing agents to reflect and refine outputs iteratively, thereby improving earlier stages through feedback from subsequent nodes. These contributions lay the groundwork for developing more robust multi-agent systems in creative tasks.

Problem

Research questions and friction points this paper is trying to address.

Developing hierarchical multi-agent framework for long video generation

Enabling context sharing through hypergraph-based agent discussions

Implementing cyclic refinement process with limited retries

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical graph-based multi-agent framework for video generation

Hypergraph nodes enable temporary group discussions among agents

Directed cyclic graphs with retries allow iterative output refinement

🔎 Similar Papers

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence