PanoWan: Lifting Diffusion Video Generation Models to 360{deg} with Latitude/Longitude-aware Mechanisms

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
解决现有全景视频生成模型无法利用预训练文本-视频模型的问题,提出PanoWan方法,通过纬度感知采样和经度边界处理模块提升生成质量,并构建PanoVid数据集支持训练。

Technology Category

Application Category

📝 Abstract
Panoramic video generation enables immersive 360{deg} content creation, valuable in applications that demand scene-consistent world exploration. However, existing panoramic video generation models struggle to leverage pre-trained generative priors from conventional text-to-video models for high-quality and diverse panoramic videos generation, due to limited dataset scale and the gap in spatial feature representations. In this paper, we introduce PanoWan to effectively lift pre-trained text-to-video models to the panoramic domain, equipped with minimal modules. PanoWan employs latitude-aware sampling to avoid latitudinal distortion, while its rotated semantic denoising and padded pixel-wise decoding ensure seamless transitions at longitude boundaries. To provide sufficient panoramic videos for learning these lifted representations, we contribute PanoVid, a high-quality panoramic video dataset with captions and diverse scenarios. Consequently, PanoWan achieves state-of-the-art performance in panoramic video generation and demonstrates robustness for zero-shot downstream tasks.
Problem

Research questions and friction points this paper is trying to address.

Leveraging pre-trained video models for 360° content generation
Addressing spatial distortion in panoramic video synthesis
Creating seamless transitions at longitude boundaries in videos
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latitude-aware sampling prevents latitudinal distortion
Rotated semantic denoising ensures seamless transitions
Padded pixel-wise decoding handles longitude boundaries
🔎 Similar Papers
No similar papers found.
Yifei Xia
Yifei Xia
Peking university
ML systemsDiffusion ModelsHPC
S
Shuchen Weng
Beijing Academy of Artificial Intelligence
Siqi Yang
Siqi Yang
University of Electronic Science and Technology of China
Generative Speech EnhancementAutomatic Speech RecognitionDiffusion Models
J
Jingqi Liu
State Key Lab of Multimedia Info. Processing, School of Computer Science, Peking University; Nat’l Eng. Research Ctr. of Visual Tech., School of Computer Science, Peking University
C
Chengxuan Zhu
Nat’l Key Lab of General AI, School of Intelligence Science and Technology, Peking University
M
Minggui Teng
State Key Lab of Multimedia Info. Processing, School of Computer Science, Peking University; Nat’l Eng. Research Ctr. of Visual Tech., School of Computer Science, Peking University
Z
Zijian Jia
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
Han Jiang
Han Jiang
Johns Hopkins University
Natural Language GenerationSocietal AIModel Evaluation
Boxin Shi
Boxin Shi
Peking University
Computer VisionComputational Photography