🤖 AI Summary
Existing 3D style transfer methods struggle to effectively model multi-view consistency and suffer from unstable training dynamics in diffusion-based frameworks. This work proposes a novel 3D style transfer paradigm that integrates diffusion models with 3D Gaussian representations. By leveraging self-attention mechanisms in the latent space, the approach aligns style and content features while incorporating geometry-guided multi-view consistency modeling. Specifically, an attention-aware loss function is introduced, which exploits geometric information to establish cross-view correspondences and generate geometry-aware masks, thereby preventing redundant optimization in overlapping regions. The proposed method achieves significant improvements over state-of-the-art approaches in terms of stylization quality, visual realism, and multi-view consistency.
📝 Abstract
3D style transfer enables the creation of visually expressive 3D content, enriching the visual appearance of 3D scenes and objects. However, existing VGG- and CLIP-based methods struggle to model multi-view consistency within the model itself, while diffusion-based approaches can capture such consistency but rely on denoising directions, leading to unstable training. To address these limitations, we propose DiffStyle3D, a novel diffusion-based paradigm for 3DGS style transfer that directly optimizes in the latent space. Specifically, we introduce an Attention-Aware Loss that performs style transfer by aligning style features in the self-attention space, while preserving original content through content feature alignment. Inspired by the geometric invariance of 3D stylization, we propose a Geometry-Guided Multi-View Consistency method that integrates geometric information into self-attention to enable cross-view correspondence modeling. Based on geometric information, we additionally construct a geometry-aware mask to prevent redundant optimization in overlapping regions across views, which further improves multi-view consistency. Extensive experiments show that DiffStyle3D outperforms state-of-the-art methods, achieving higher stylization quality and visual realism.