ReStyle3D: Scene-Level Appearance Transfer with Semantic Correspondences

📅 2025-02-14

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses single-image style-driven multi-view appearance transfer for real-world 3D scenes, ensuring both semantic consistency and cross-view geometric coherence. We propose the first scene-level, semantics-driven appearance transfer framework: (i) instance-level dense semantic correspondences are established via open-vocabulary segmentation; (ii) a diffusion model’s semantic attention is jointly optimized with a depth-guided, learnable warp-and-refine network to achieve geometric alignment and detail refinement. Our method generalizes to arbitrary novel views without fine-tuning and natively supports 3D-consistent stylization. Quantitative and qualitative evaluations demonstrate state-of-the-art performance in structural fidelity, perceptual style quality, and multi-view consistency. A user study confirms that generated results exhibit high photorealism and strong semantic faithfulness.

Technology Category

Application Category

📝 Abstract

We introduce ReStyle3D, a novel framework for scene-level appearance transfer from a single style image to a real-world scene represented by multiple views. The method combines explicit semantic correspondences with multi-view consistency to achieve precise and coherent stylization. Unlike conventional stylization methods that apply a reference style globally, ReStyle3D uses open-vocabulary segmentation to establish dense, instance-level correspondences between the style and real-world images. This ensures that each object is stylized with semantically matched textures. It first transfers the style to a single view using a training-free semantic-attention mechanism in a diffusion model. It then lifts the stylization to additional views via a learned warp-and-refine network guided by monocular depth and pixel-wise correspondences. Experiments show that ReStyle3D consistently outperforms prior methods in structure preservation, perceptual style similarity, and multi-view coherence. User studies further validate its ability to produce photo-realistic, semantically faithful results. Our code, pretrained models, and dataset will be publicly released, to support new applications in interior design, virtual staging, and 3D-consistent stylization.

Problem

Research questions and friction points this paper is trying to address.

Scene-level appearance transfer

Semantic correspondences in stylization

Multi-view consistency in 3D stylization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-attention mechanism diffusion model

Open-vocabulary segmentation dense correspondences

Warp-and-refine network multi-view coherence

🔎 Similar Papers

No similar papers found.