DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Existing panoramic image generation methods are constrained by the scarcity of high-quality real-world panoramic data, making it challenging to simultaneously achieve geometric fidelity and photorealism. To address this, we propose DiT360—a novel diffusion-based framework introducing a data-centric hybrid training paradigm. DiT360 jointly leverages perspective and panoramic images for cross-domain knowledge guidance, incorporating viewpoint-image guidance, panoramic refinement, cyclic padding, and dual geometric losses (yaw- and cube-based) in both pre- and post-VAE stages. Furthermore, it applies multi-module hybrid supervision at both image- and token-levels. This design significantly improves consistency and realism in cross-domain translation and intra-domain enhancement. Extensive experiments demonstrate that DiT360 outperforms all state-of-the-art methods across eleven quantitative metrics on text-to-panorama generation, inpainting, and expansion tasks, with notable gains in boundary continuity and joint geometric-visual fidelity.

Technology Category

Application Category

📝 Abstract

In this work, we propose DiT360, a DiT-based framework that performs hybrid training on perspective and panoramic data for panoramic image generation. For the issues of maintaining geometric fidelity and photorealism in generation quality, we attribute the main reason to the lack of large-scale, high-quality, real-world panoramic data, where such a data-centric view differs from prior methods that focus on model design. Basically, DiT360 has several key modules for inter-domain transformation and intra-domain augmentation, applied at both the pre-VAE image level and the post-VAE token level. At the image level, we incorporate cross-domain knowledge through perspective image guidance and panoramic refinement, which enhance perceptual quality while regularizing diversity and photorealism. At the token level, hybrid supervision is applied across multiple modules, which include circular padding for boundary continuity, yaw loss for rotational robustness, and cube loss for distortion awareness. Extensive experiments on text-to-panorama, inpainting, and outpainting tasks demonstrate that our method achieves better boundary consistency and image fidelity across eleven quantitative metrics. Our code is available at https://github.com/Insta360-Research-Team/DiT360.

Problem

Research questions and friction points this paper is trying to address.

Generating high-fidelity panoramic images with geometric fidelity

Addressing lack of large-scale real-world panoramic training data

Maintaining boundary consistency and photorealism in panoramic generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid training on perspective and panoramic data

Cross-domain knowledge at image and token levels

Circular padding and yaw loss for boundary continuity

🔎 Similar Papers

PanoDiffusion: 360-degree Panorama Outpainting via Diffusion