ReStyle3D: Scene-Level Appearance Transfer with Semantic Correspondences

πŸ“… 2025-02-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses single-image style-driven multi-view appearance transfer for real-world 3D scenes, ensuring both semantic consistency and cross-view geometric coherence. We propose the first scene-level, semantics-driven appearance transfer framework: (i) instance-level dense semantic correspondences are established via open-vocabulary segmentation; (ii) a diffusion model’s semantic attention is jointly optimized with a depth-guided, learnable warp-and-refine network to achieve geometric alignment and detail refinement. Our method generalizes to arbitrary novel views without fine-tuning and natively supports 3D-consistent stylization. Quantitative and qualitative evaluations demonstrate state-of-the-art performance in structural fidelity, perceptual style quality, and multi-view consistency. A user study confirms that generated results exhibit high photorealism and strong semantic faithfulness.

Technology Category

Application Category

πŸ“ Abstract
We introduce ReStyle3D, a novel framework for scene-level appearance transfer from a single style image to a real-world scene represented by multiple views. The method combines explicit semantic correspondences with multi-view consistency to achieve precise and coherent stylization. Unlike conventional stylization methods that apply a reference style globally, ReStyle3D uses open-vocabulary segmentation to establish dense, instance-level correspondences between the style and real-world images. This ensures that each object is stylized with semantically matched textures. It first transfers the style to a single view using a training-free semantic-attention mechanism in a diffusion model. It then lifts the stylization to additional views via a learned warp-and-refine network guided by monocular depth and pixel-wise correspondences. Experiments show that ReStyle3D consistently outperforms prior methods in structure preservation, perceptual style similarity, and multi-view coherence. User studies further validate its ability to produce photo-realistic, semantically faithful results. Our code, pretrained models, and dataset will be publicly released, to support new applications in interior design, virtual staging, and 3D-consistent stylization.
Problem

Research questions and friction points this paper is trying to address.

Scene-level appearance transfer
Semantic correspondences in stylization
Multi-view consistency in 3D stylization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic-attention mechanism diffusion model
Open-vocabulary segmentation dense correspondences
Warp-and-refine network multi-view coherence
πŸ”Ž Similar Papers
No similar papers found.