🤖 AI Summary
This work addresses the challenge of maintaining semantically coherent narratives in video generation with diffusion models, a task further complicated by semantic drift and cascading errors in conventional agent pipelines due to isolated, handcrafted prompts. To overcome these limitations, the authors propose a hierarchical multi-agent framework that formulates video narrative generation as a global optimization problem. The approach leverages a multi-armed bandit mechanism for high-level creative exploration while employing a multimodal self-refinement loop to ensure local sequence consistency, thereby effectively balancing exploration and exploitation. Integrated with a hierarchical parameterization scheme and a diffusion-based video generator, the method significantly outperforms state-of-the-art approaches on the newly introduced GenAD-Bench dataset, demonstrating strong generalization capabilities for personalized advertising and cinematic storytelling.
📝 Abstract
While diffusion models generate high-fidelity video clips, transforming them into coherent storytelling engines remains challenging. Current agentic pipelines automate this via chained modules but suffer from semantic drift and cascading failures due to independent, handcrafted prompting. We present Co-Director, a hierarchical multi-agent framework formalizing video storytelling as a global optimization problem. To ensure semantic coherence, we introduce hierarchical parameterization: a multi-armed bandit globally identifies promising creative directions, while a local multimodal self-refinement loop mitigates identity drift and ensures sequence-level consistency. This balances the exploration of novel narrative strategies with the exploitation of effective creative configurations. For evaluation, we introduce GenAD-Bench, a 400-scenario dataset of fictional products for personalized advertising. Experiments demonstrate that Co-Director significantly outperforms state-of-the-art baselines, offering a principled approach that seamlessly generalizes to broader cinematic narratives. Project Page: https://co-director-agent.github.io/