π€ AI Summary
This work addresses single-image style-driven multi-view appearance transfer for real-world 3D scenes, ensuring both semantic consistency and cross-view geometric coherence. We propose the first scene-level, semantics-driven appearance transfer framework: (i) instance-level dense semantic correspondences are established via open-vocabulary segmentation; (ii) a diffusion modelβs semantic attention is jointly optimized with a depth-guided, learnable warp-and-refine network to achieve geometric alignment and detail refinement. Our method generalizes to arbitrary novel views without fine-tuning and natively supports 3D-consistent stylization. Quantitative and qualitative evaluations demonstrate state-of-the-art performance in structural fidelity, perceptual style quality, and multi-view consistency. A user study confirms that generated results exhibit high photorealism and strong semantic faithfulness.
π Abstract
We introduce ReStyle3D, a novel framework for scene-level appearance transfer from a single style image to a real-world scene represented by multiple views. The method combines explicit semantic correspondences with multi-view consistency to achieve precise and coherent stylization. Unlike conventional stylization methods that apply a reference style globally, ReStyle3D uses open-vocabulary segmentation to establish dense, instance-level correspondences between the style and real-world images. This ensures that each object is stylized with semantically matched textures. It first transfers the style to a single view using a training-free semantic-attention mechanism in a diffusion model. It then lifts the stylization to additional views via a learned warp-and-refine network guided by monocular depth and pixel-wise correspondences. Experiments show that ReStyle3D consistently outperforms prior methods in structure preservation, perceptual style similarity, and multi-view coherence. User studies further validate its ability to produce photo-realistic, semantically faithful results. Our code, pretrained models, and dataset will be publicly released, to support new applications in interior design, virtual staging, and 3D-consistent stylization.