🤖 AI Summary
Panoramic video generation faces challenges including complex motion control and discontinuous motion across spherical boundaries. To address these, we propose a spherical motion disentanglement framework: leveraging spherical geometry, it explicitly decouples camera rotation from scene optical flow via conditional modeling; a spherical noise warping strategy enables independent control of dynamic camera motion and content flow; and spherical optical flow supervision—jointly annotated with camera pose and flow—is employed for training on a large-scale, motion-rich dataset. This work is the first to simultaneously achieve motion disentanglement and cross-boundary temporal consistency in panoramic video generation. It significantly improves motion fidelity, visual quality, and temporal coherence, demonstrating superior performance in motion transfer and editing tasks. The code and models are publicly available.
📝 Abstract
Panoramic video generation has attracted growing attention due to its applications in virtual reality and immersive media. However, existing methods lack explicit motion control and struggle to generate scenes with large and complex motions. We propose PanFlow, a novel approach that exploits the spherical nature of panoramas to decouple the highly dynamic camera rotation from the input optical flow condition, enabling more precise control over large and dynamic motions. We further introduce a spherical noise warping strategy to promote loop consistency in motion across panorama boundaries. To support effective training, we curate a large-scale, motion-rich panoramic video dataset with frame-level pose and flow annotations. We also showcase the effectiveness of our method in various applications, including motion transfer and video editing. Extensive experiments demonstrate that PanFlow significantly outperforms prior methods in motion fidelity, visual quality, and temporal coherence. Our code, dataset, and models are available at https://github.com/chengzhag/PanFlow.