VRsketch2Gaussian: 3D VR Sketch Guided 3D Object Generation with Gaussian Splatting

📅 2025-03-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the research gap in multimodal VR sketch-driven 3D generation by proposing the first native VR hand-drawn sketch-guided multimodal 3D Gaussian Splatting (3DGS) generation framework. Methodologically: (1) it introduces a two-stage Sketch-CLIP feature alignment mechanism to enable cross-modal semantic coordination between VR sketches and text prompts; (2) it decouples geometric control (from sketches) from appearance control (from text), enhancing generation controllability; and (3) it constructs VRSS—the first large-scale, four-modal paired dataset comprising VR sketches, text descriptions, rendered images, and corresponding 3DGS representations. Experiments on VRSS demonstrate significant improvements in geometric fidelity and texture quality, yielding high-fidelity, renderable 3DGS models. Moreover, the framework achieves superior inference efficiency compared to existing sketch-to-3D approaches.

Technology Category

Application Category

📝 Abstract

We propose VRSketch2Gaussian, a first VR sketch-guided, multi-modal, native 3D object generation framework that incorporates a 3D Gaussian Splatting representation. As part of our work, we introduce VRSS, the first large-scale paired dataset containing VR sketches, text, images, and 3DGS, bridging the gap in multi-modal VR sketch-based generation. Our approach features the following key innovations: 1) Sketch-CLIP feature alignment. We propose a two-stage alignment strategy that bridges the domain gap between sparse VR sketch embeddings and rich CLIP embeddings, facilitating both VR sketch-based retrieval and generation tasks. 2) Fine-Grained multi-modal conditioning. We disentangle the 3D generation process by using explicit VR sketches for geometric conditioning and text descriptions for appearance control. To facilitate this, we propose a generalizable VR sketch encoder that effectively aligns different modalities. 3) Efficient and high-fidelity 3D native generation. Our method leverages a 3D-native generation approach that enables fast and texture-rich 3D object synthesis. Experiments conducted on our VRSS dataset demonstrate that our method achieves high-quality, multi-modal VR sketch-based 3D generation. We believe our VRSS dataset and VRsketch2Gaussian method will be beneficial for the 3D generation community.

Problem

Research questions and friction points this paper is trying to address.

Generates 3D objects from VR sketches using Gaussian Splatting

Introduces VRSS dataset for multi-modal VR sketch-based generation

Aligns VR sketch embeddings with CLIP for better retrieval and generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

VR sketch-guided 3D object generation

Sketch-CLIP feature alignment strategy

3D-native high-fidelity object synthesis

🔎 Similar Papers

No similar papers found.

Authors to Follow