Stylos: Multi-View 3D Stylization with Single-Forward Gaussian Splatting

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the challenge of 3D style transfer without pose annotations or scene-specific optimization, enabling zero-shot generalization from a single input image to unposed multi-view inputs. The proposed method introduces a dual-path Transformer architecture: a geometry branch employs self-attention to preserve structural fidelity of 3D Gaussian splatting, while a style branch incorporates global cross-attention to ensure semantic consistency during style injection. Additionally, a voxelized 3D style loss is designed to decouple geometric fidelity from multi-view appearance consistency. Evaluated across multiple benchmarks, the framework achieves high-fidelity, view-consistent stylized 3D Gaussian reconstructions, significantly improving geometric accuracy and cross-view style coherence. It demonstrates strong generalization to unseen object categories and complex scenes, with notable scalability and practical applicability.

Technology Category

Application Category

📝 Abstract

We present Stylos, a single-forward 3D Gaussian framework for 3D style transfer that operates on unposed content, from a single image to a multi-view collection, conditioned on a separate reference style image. Stylos synthesizes a stylized 3D Gaussian scene without per-scene optimization or precomputed poses, achieving geometry-aware, view-consistent stylization that generalizes to unseen categories, scenes, and styles. At its core, Stylos adopts a Transformer backbone with two pathways: geometry predictions retain self-attention to preserve geometric fidelity, while style is injected via global cross-attention to enforce visual consistency across views. With the addition of a voxel-based 3D style loss that aligns aggregated scene features to style statistics, Stylos enforces view-consistent stylization while preserving geometry. Experiments across multiple datasets demonstrate that Stylos delivers high-quality zero-shot stylization, highlighting the effectiveness of global style-content coupling, the proposed 3D style loss, and the scalability of our framework from single view to large-scale multi-view settings.

Problem

Research questions and friction points this paper is trying to address.

Achieves 3D style transfer without per-scene optimization or precomputed poses

Enables geometry-aware stylization that generalizes to unseen categories and scenes

Provides view-consistent stylization from single images to multi-view collections

Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-forward 3D Gaussian framework for style transfer

Transformer backbone with geometry and style pathways

Voxel-based 3D style loss for view consistency

🔎 Similar Papers

No similar papers found.