JoPano: Unified Panorama Generation via Joint Modeling

๐Ÿ“… 2025-12-07
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing panoramic image generation methods face two key bottlenecks: (1) U-Net-based architectures inherently limit visual fidelity, and (2) text-to-panorama and view-to-panorama tasks are modeled separately, leading to redundancy and inefficiency. To address these, we propose the first DiT-based unified generative framework for panoramic synthesis. Our method employs cubic mapping to enable multi-view cooperative modeling, introduces a Joint-Face Adapter with conditional switching to achieve end-to-end joint optimization of both tasks for the first time, and incorporates Poisson blending to mitigate seam artifacts. We further propose Seam-SSIM and Seam-Sobelโ€”novel metrics quantifying seam consistency across adjacent faces. Extensive experiments demonstrate state-of-the-art performance on FID, CLIP-FID, Inception Score (IS), and CLIP-Score, significantly improving both visual quality and cross-view consistency of generated panoramas.

Technology Category

Application Category

๐Ÿ“ Abstract
Panorama generation has recently attracted growing interest in the research community, with two core tasks, text-to-panorama and view-to-panorama generation. However, existing methods still face two major challenges: their U-Net-based architectures constrain the visual quality of the generated panoramas, and they usually treat the two core tasks independently, which leads to modeling redundancy and inefficiency. To overcome these challenges, we propose a joint-face panorama (JoPano) generation approach that unifies the two core tasks within a DiT-based model. To transfer the rich generative capabilities of existing DiT backbones learned from natural images to the panorama domain, we propose a Joint-Face Adapter built on the cubemap representation of panoramas, which enables a pretrained DiT to jointly model and generate different views of a panorama. We further apply Poisson Blending to reduce seam inconsistencies that often appear at the boundaries between cube faces. Correspondingly, we introduce Seam-SSIM and Seam-Sobel metrics to quantitatively evaluate the seam consistency. Moreover, we propose a condition switching mechanism that unifies text-to-panorama and view-to-panorama tasks within a single model. Comprehensive experiments show that JoPano can generate high-quality panoramas for both text-to-panorama and view-to-panorama generation tasks, achieving state-of-the-art performance on FID, CLIP-FID, IS, and CLIP-Score metrics.
Problem

Research questions and friction points this paper is trying to address.

Unify text-to-panorama and view-to-panorama generation tasks
Enhance visual quality of generated panoramas using DiT-based model
Reduce seam inconsistencies in panorama cube face boundaries
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified DiT-based model for panorama generation tasks
Joint-Face Adapter transfers generative capabilities to panorama domain
Poisson Blending reduces seam inconsistencies between cube faces
๐Ÿ”Ž Similar Papers
No similar papers found.