PanoWan: Lifting Diffusion Video Generation Models to 360{deg} with Latitude/Longitude-aware Mechanisms

📅 2025-05-28

📈 Citations: 0

✨ Influential: 0

📄 PDF

career value

193K/year

🤖 AI Summary

解决现有全景视频生成模型无法利用预训练文本-视频模型的问题，提出PanoWan方法，通过纬度感知采样和经度边界处理模块提升生成质量，并构建PanoVid数据集支持训练。

Technology Category

Application Category

📝 Abstract

Panoramic video generation enables immersive 360{deg} content creation, valuable in applications that demand scene-consistent world exploration. However, existing panoramic video generation models struggle to leverage pre-trained generative priors from conventional text-to-video models for high-quality and diverse panoramic videos generation, due to limited dataset scale and the gap in spatial feature representations. In this paper, we introduce PanoWan to effectively lift pre-trained text-to-video models to the panoramic domain, equipped with minimal modules. PanoWan employs latitude-aware sampling to avoid latitudinal distortion, while its rotated semantic denoising and padded pixel-wise decoding ensure seamless transitions at longitude boundaries. To provide sufficient panoramic videos for learning these lifted representations, we contribute PanoVid, a high-quality panoramic video dataset with captions and diverse scenarios. Consequently, PanoWan achieves state-of-the-art performance in panoramic video generation and demonstrates robustness for zero-shot downstream tasks.

Problem

Research questions and friction points this paper is trying to address.

Leveraging pre-trained video models for 360° content generation

Addressing spatial distortion in panoramic video synthesis

Creating seamless transitions at longitude boundaries in videos

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latitude-aware sampling prevents latitudinal distortion

Rotated semantic denoising ensures seamless transitions

Padded pixel-wise decoding handles longitude boundaries

🔎 Similar Papers

No similar papers found.