VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting

πŸ“… 2025-03-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the research gap in multimodal VR sketch-driven 3D generation by proposing the first native VR hand-drawn sketch-guided multimodal 3D Gaussian Splatting (3DGS) generation framework. Methodologically: (1) it introduces a two-stage Sketch-CLIP feature alignment mechanism to enable cross-modal semantic coordination between VR sketches and text prompts; (2) it decouples geometric control (from sketches) from appearance control (from text), enhancing generation controllability; and (3) it constructs VRSSβ€”the first large-scale, four-modal paired dataset comprising VR sketches, text descriptions, rendered images, and corresponding 3DGS representations. Experiments on VRSS demonstrate significant improvements in geometric fidelity and texture quality, yielding high-fidelity, renderable 3DGS models. Moreover, the framework achieves superior inference efficiency compared to existing sketch-to-3D approaches.

Technology Category

Application Category

πŸ“ Abstract
We propose VRSketch2Gaussian, a first VR sketch-guided, multi-modal, native 3D object generation framework that incorporates a 3D Gaussian Splatting representation. As part of our work, we introduce VRSS, the first large-scale paired dataset containing VR sketches, text, images, and 3DGS, bridging the gap in multi-modal VR sketch-based generation. Our approach features the following key innovations: 1) Sketch-CLIP feature alignment. We propose a two-stage alignment strategy that bridges the domain gap between sparse VR sketch embeddings and rich CLIP embeddings, facilitating both VR sketch-based retrieval and generation tasks. 2) Fine-Grained multi-modal conditioning. We disentangle the 3D generation process by using explicit VR sketches for geometric conditioning and text descriptions for appearance control. To facilitate this, we propose a generalizable VR sketch encoder that effectively aligns different modalities. 3) Efficient and high-fidelity 3D native generation. Our method leverages a 3D-native generation approach that enables fast and texture-rich 3D object synthesis. Experiments conducted on our VRSS dataset demonstrate that our method achieves high-quality, multi-modal VR sketch-based 3D generation. We believe our VRSS dataset and VRsketch2Gaussian method will be beneficial for the 3D generation community.
Problem

Research questions and friction points this paper is trying to address.

Generates 3D objects from VR sketches using Gaussian Splatting
Introduces VRSS dataset for multi-modal VR sketch-based generation
Aligns VR sketch embeddings with CLIP for better retrieval and generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

VR sketch-guided 3D object generation
Sketch-CLIP feature alignment strategy
3D-native high-fidelity object synthesis
πŸ”Ž Similar Papers
No similar papers found.
Songen Gu
Songen Gu
UCAS
Robotics3D Vision
H
Haoxuan Song
UCAS
B
Binjie Liu
Communication University of China
Qian Yu
Qian Yu
Professor, Dept of Earth, Geographic, and Climate Sciences, University of Massachusetts-Amherst
GISremote sensingSpatial modeling
S
Sanyi Zhang
Communication University of China
H
Haiyong Jiang
UCAS
J
Jin Huang
UCAS
F
Feng Tian
Institute of Software, CAS, UCAS