🤖 AI Summary
This work addresses the challenge of visual incoherence in long-form visual storytelling, which often manifests as abrupt character appearance changes, inconsistent backgrounds, and disjointed scene transitions. To tackle this, the authors propose a multi-agent collaborative framework that explicitly models visual continuity across multiple shots for the first time, enforcing consistency through character continuity constraints, persistent background anchors, and position-aware scene planning. The study introduces HardContinuityBench, a new benchmark for evaluating long-range visual consistency, and demonstrates significant performance gains over existing methods on ST-Bench and ViStoryBench, achieving relative improvements of 21.6% in background continuity, 9.6% in character consistency, and 7.6% in prop consistency.
📝 Abstract
Long-form visual storytelling requires maintaining continuity across shots, including consistent characters, stable environments, and smooth scene transitions. While existing generative models can produce strong individual frames, they fail to preserve such continuity, leading to appearance changes, inconsistent backgrounds, and abrupt scene shifts. We introduce CANVAS (Continuity-Aware Narratives via Visual Agentic Storyboarding), a multi-agent framework that explicitly plans visual continuity in multi-shot narratives. CANVAS enforces coherence through character continuity, persistent background anchors, and location-aware scene planning for smooth transitions within the same setting We evaluate CANVAS on two storyboard generation benchmarks ST-BENCH and ViStoryBench and introduce a new challenging benchmark HardContinuityBench for long-range narrative consistency. CANVAS consistently outperforms the best-performing baseline, improving background continuity by 21.6%, character consistency by 9.6% and props consistency by 7.6%.