🤖 AI Summary
This paper addresses key challenges in 360° omnidirectional image (ODI) generation and editing—namely, spherical geometric distortion, difficulty in modeling wide-field-of-view content, and task fragmentation—by proposing Omni, the first unified single-model framework for both ODI generation and editing. Methodologically, we introduce Any2Omni, a large-scale multi-task benchmark comprising over 60,000 samples across nine diverse tasks, and design an end-to-end diffusion model integrating spherical coordinate-aware representations, multimodal conditional encoding, and a shared architecture. Our contributions are threefold: (1) the first unified modeling of ODI generation and editing within a single framework; (2) overcoming the geometric adaptation bottleneck of conventional 2D models on spherical manifolds; and (3) achieving significant improvements over state-of-the-art methods across multiple tasks, demonstrating strong generalization capability and geometric consistency.
📝 Abstract
$360^{circ}$ omnidirectional images (ODIs) have gained considerable attention recently, and are widely used in various virtual reality (VR) and augmented reality (AR) applications. However, capturing such images is expensive and requires specialized equipment, making ODI synthesis increasingly important. While common 2D image generation and editing methods are rapidly advancing, these models struggle to deliver satisfactory results when generating or editing ODIs due to the unique format and broad 360$^{circ}$ Field-of-View (FoV) of ODIs. To bridge this gap, we construct extbf{ extit{Any2Omni}}, the first comprehensive ODI generation-editing dataset comprises 60,000+ training data covering diverse input conditions and up to 9 ODI generation and editing tasks. Built upon Any2Omni, we propose an extbf{underline{Omni}} model for extbf{underline{Omni}}-directional image generation and editing ( extbf{ extit{Omni$^2$}}), with the capability of handling various ODI generation and editing tasks under diverse input conditions using one model. Extensive experiments demonstrate the superiority and effectiveness of the proposed Omni$^2$ model for both the ODI generation and editing tasks.