SE360: Semantic Edit in 360$^circ$ Panoramas via Hierarchical Data Construction

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

📄 PDF

career value

190K/year

🤖 AI Summary

针对360度全景图中语义编辑的挑战，提出SE360框架，通过分层数据构建和两阶段数据优化策略，结合基于Transformer的扩散模型实现多条件引导的灵活编辑。

Technology Category

Application Category

📝 Abstract

While instruction-based image editing is emerging, extending it to 360$^circ$ panoramas introduces additional challenges. Existing methods often produce implausible results in both equirectangular projections (ERP) and perspective views. To address these limitations, we propose SE360, a novel framework for multi-condition guided object editing in 360$^circ$ panoramas. At its core is a novel coarse-to-fine autonomous data generation pipeline without manual intervention. This pipeline leverages a Vision-Language Model (VLM) and adaptive projection adjustment for hierarchical analysis, ensuring the holistic segmentation of objects and their physical context. The resulting data pairs are both semantically meaningful and geometrically consistent, even when sourced from unlabeled panoramas. Furthermore, we introduce a cost-effective, two-stage data refinement strategy to improve data realism and mitigate model overfitting to erase artifacts. Based on the constructed dataset, we train a Transformer-based diffusion model to allow flexible object editing guided by text, mask, or reference image in 360$^circ$ panoramas. Our experiments demonstrate that our method outperforms existing methods in both visual quality and semantic accuracy.

Problem

Research questions and friction points this paper is trying to address.

Editing objects in 360-degree panoramas with semantic and geometric consistency

Generating high-quality training data from unlabeled panoramas autonomously

Enabling flexible multi-condition guided editing via text, mask, or reference image

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical data construction for 360-degree panoramas

Two-stage refinement strategy to enhance data realism

Transformer-based diffusion model for multi-condition editing

🔎 Similar Papers

Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View