DiffStyle3D: Consistent 3D Gaussian Stylization via Attention Optimization

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D style transfer methods struggle to effectively model multi-view consistency and suffer from unstable training dynamics in diffusion-based frameworks. This work proposes a novel 3D style transfer paradigm that integrates diffusion models with 3D Gaussian representations. By leveraging self-attention mechanisms in the latent space, the approach aligns style and content features while incorporating geometry-guided multi-view consistency modeling. Specifically, an attention-aware loss function is introduced, which exploits geometric information to establish cross-view correspondences and generate geometry-aware masks, thereby preventing redundant optimization in overlapping regions. The proposed method achieves significant improvements over state-of-the-art approaches in terms of stylization quality, visual realism, and multi-view consistency.

Technology Category

Application Category

📝 Abstract
3D style transfer enables the creation of visually expressive 3D content, enriching the visual appearance of 3D scenes and objects. However, existing VGG- and CLIP-based methods struggle to model multi-view consistency within the model itself, while diffusion-based approaches can capture such consistency but rely on denoising directions, leading to unstable training. To address these limitations, we propose DiffStyle3D, a novel diffusion-based paradigm for 3DGS style transfer that directly optimizes in the latent space. Specifically, we introduce an Attention-Aware Loss that performs style transfer by aligning style features in the self-attention space, while preserving original content through content feature alignment. Inspired by the geometric invariance of 3D stylization, we propose a Geometry-Guided Multi-View Consistency method that integrates geometric information into self-attention to enable cross-view correspondence modeling. Based on geometric information, we additionally construct a geometry-aware mask to prevent redundant optimization in overlapping regions across views, which further improves multi-view consistency. Extensive experiments show that DiffStyle3D outperforms state-of-the-art methods, achieving higher stylization quality and visual realism.
Problem

Research questions and friction points this paper is trying to address.

3D style transfer
multi-view consistency
diffusion-based methods
style transfer stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D style transfer
diffusion-based stylization
attention-aware loss
multi-view consistency
geometry-guided optimization
🔎 Similar Papers
No similar papers found.
Yitong Yang
Yitong Yang
Shanghai University of Finance and Economics
X
Xuexin Liu
School of Computing and Artificial Intelligence, Shanghai University of Finance and Economics
Y
Yinglin Wang
School of Computing and Artificial Intelligence, Shanghai University of Finance and Economics
J
Jing Wang
School of Computing and Artificial Intelligence, Shanghai University of Finance and Economics
Hao Dou
Hao Dou
Institute of Automation, Chinese Academy of Sciences
Machine LearningImage Processing
Changshuo Wang
Changshuo Wang
MSCA Post Doctoral Fellow, University College London (UCL), United Kingdom
Computer VisionRobot PerceptionPoint Cloud AnalysisPerson Re-identification
Shuting He
Shuting He
Assistant Professor, Shanghai University of Finance and Economics
Computer Vision