🤖 AI Summary
This work addresses the challenges of automatically generating animated data videos that cohesively integrate dynamic visualizations with synchronized narration—a task requiring careful coordination of visual encoding, temporal progression, and narrative structure, yet lacking a standardized evaluation benchmark. To bridge this gap, we introduce DataReel, the first benchmark specifically designed for animated data video storytelling, comprising 328 real-world video stories. We further propose a large language model–based multi-agent framework that decomposes the generation process into planning, generation, and validation stages, emulating human-like narrative workflows to enable end-to-end collaborative creation. Experimental results demonstrate that our approach significantly outperforms direct prompting baselines in both automatic and human evaluations, uncovering key mechanisms and challenges in the synergistic interplay among animation, narration, and visual emphasis.
📝 Abstract
Data videos are a powerful medium for visual data based storytelling, combining animated, chart-centric visualizations with synchronized narration. Widely used in journalism, education, and public communication, they help audiences understand complex data through clear and engaging visual explanations. Despite their growing impact, generating data-driven video stories remains challenging, as it requires careful coordination of visual encoding, temporal progression, and narration and substantial expertise in visualization design, animation, and video-editing tools. Recent advances in large language models offer new opportunities to automate this process; however, there is currently no benchmark for rigorously evaluating models on animated visualization-based video storytelling. To address this gap, we introduce DataReel, a benchmark for automated data-driven video story generation comprising 328 real-world stories. Each story pairs structured data, a chart visualization, and a narration transcript, enabling systematic evaluation of models' abilities to generate animated data video stories. We further propose a multi-agent framework that decomposes the task into planning, generation, and verification stages, mirroring key aspects of the human storytelling process. Experiments show that this multi-agent approach outperforms direct prompting baselines under both automatic and human evaluations, while revealing persistent challenges in coordinating animation, narration, and visual emphasis. We release DataReel at https://github.com/vis-nlp/DataReel.