InfinityStory: Unlimited Video Generation with World Consistency and Character-Aware Shot Transitions

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of background inconsistency, discontinuous multi-character shot transitions, and limited scalability to hour-long narratives in long-form video generation. To tackle these issues, the authors propose a video generation framework tailored for complex multi-agent scenes, featuring a background consistency generation pipeline and a transition-aware synthesis module. This design preserves character identity and enables natural entrance and exit transitions across frames. The framework is trained on a newly constructed synthetic dataset comprising 10,000 multi-agent transition sequences. Experimental results demonstrate significant improvements over existing methods on VBench, achieving scores of 88.94 in background consistency, 82.11 in subject consistency, and an average ranking of 2.80, thereby enhancing spatiotemporal coherence, transition smoothness, and long-range narrative capability.

Technology Category

Application Category

📝 Abstract
Generating long-form storytelling videos with consistent visual narratives remains a significant challenge in video synthesis. We present a novel framework, dataset, and a model that address three critical limitations: background consistency across shots, seamless multi-subject shot-to-shot transitions, and scalability to hour-long narratives. Our approach introduces a background-consistent generation pipeline that maintains visual coherence across scenes while preserving character identity and spatial relationships. We further propose a transition-aware video synthesis module that generates smooth shot transitions for complex scenarios involving multiple subjects entering or exiting frames, going beyond the single-subject limitations of prior work. To support this, we contribute with a synthetic dataset of 10,000 multi-subject transition sequences covering underrepresented dynamic scene compositions. On VBench, InfinityStory achieves the highest Background Consistency (88.94), highest Subject Consistency (82.11), and the best overall average rank (2.80), showing improved stability, smoother transitions, and better temporal coherence.
Problem

Research questions and friction points this paper is trying to address.

video generation
world consistency
character-aware transitions
long-form storytelling
multi-subject transitions
Innovation

Methods, ideas, or system contributions that make the work stand out.

world consistency
character-aware transitions
multi-subject video synthesis
long-form video generation
transition-aware synthesis
🔎 Similar Papers
M
Mohamed Elmoghany
Adobe Research
Liangbing Zhao
Liangbing Zhao
Ms/PhD, King Abdullah University of Science and Technology
Generative ModelsMLLM
Xiaoqian Shen
Xiaoqian Shen
CS PhD @ KAUST
Generative ModelsVision-Language
Subhojyoti Mukherjee
Subhojyoti Mukherjee
Adobe Research
Multi-armed BanditsReinforcement LearningLarge Language ModelsRLHF
Yang Zhou
Yang Zhou
Adobe Research
Computer GraphicsComputer VisionMachine Learning
Gang Wu
Gang Wu
Adobe Research
Machine LearningArtificial IntelligenceStatistical LearningData Mining
V
Viet Dac Lai
Adobe Research
Seunghyun Yoon
Seunghyun Yoon
Assistant Professor, Korea Institute of Energy Technology (KENTECH)
Reinforcement LearningDeep LearningData ScienceNetworkingCyber Security
R
Ryan Rossi
Adobe Research
Abdullah Rashwan
Abdullah Rashwan
PhD student, Computer Science Department, University of Waterloo
machine learningprobabilistic graphical modelsdeep learning
P
Puneet Mathur
Adobe Research
Varun Manjunatha
Varun Manjunatha
Senior Research Scientist, Adobe Research
CVNLPLLMs
D
Daksh Dangi
Independent Researcher
C
Chien Nguyen
University of Oregon
Nedim Lipka
Nedim Lipka
Adobe Systems Inc
Big Data AnalyticsMachine LearningWeb MiningOnline Advertisement
T
Trung Bui
Adobe Research
Krishna Kumar Singh
Krishna Kumar Singh
Adobe Research
Computer VisionMachine Learning
R
Ruiyi Zhang
Adobe Research
Xiaolei Huang
Xiaolei Huang
University of Memphis
Machine LearningNatural Language ProcessingHealth InformaticsLLM for Sciences
Jaemin Cho
Jaemin Cho
PhD Student at UNC Chapel Hill
Multimodal LearningNatural Language ProcessingMachine Learning
Yu Wang
Yu Wang
Department of Computer Science, University of Oregon
Data MiningMachine LearningNeural-Symbolic LearningGraph and NetworkStructured Knowledge
Namyong Park
Namyong Park
Meta AI
Machine LearningRepresentation LearningGraph LearningKnowledge ReasoningComplex Networks
Zhengzhong Tu
Zhengzhong Tu
Texas A&M University, Google Research, University of Texas at Austin
Agentic AITrustworthy AIEmbodied AI
Hongjie Chen
Hongjie Chen
Dolby Labs.
GraphTime seriesVisualization
Hoda Eldardiry
Hoda Eldardiry
Associate Professor of Computer Science, Virginia Tech
Machine Learning