Stroke3D: Lifting 2D strokes into rigged 3D model via latent diffusion models

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D generation methods struggle to produce animatable geometry, while conventional rigging techniques lack fine-grained control over skeletal structures. This work proposes a two-stage framework that, for the first time, enables direct generation of rigged 3D meshes from user-provided 2D sketches and text prompts. The approach integrates a Skeletal Graph VAE with a Skeleton-aware Diffusion Transformer (Sk-DiT) to establish a controllable skeleton generation mechanism. Mesh quality is further enhanced through TextuRig data augmentation and SKA-DPO preference optimization. The resulting models exhibit both anatomically plausible skeletons and high-fidelity geometric details, significantly improving intuitive user control over the creation of animatable 3D content.

Technology Category

Application Category

📝 Abstract
Rigged 3D assets are fundamental to 3D deformation and animation. However, existing 3D generation methods face challenges in generating animatable geometry, while rigging techniques lack fine-grained structural control over skeleton creation. To address these limitations, we introduce Stroke3D, a novel framework that directly generates rigged meshes from user inputs: 2D drawn strokes and a descriptive text prompt. Our approach pioneers a two-stage pipeline that separates the generation into: 1) Controllable Skeleton Generation, we employ the Skeletal Graph VAE (Sk-VAE) to encode the skeleton's graph structure into a latent space, where the Skeletal Graph DiT (Sk-DiT) generates a skeletal embedding. The generation process is conditioned on both the text for semantics and the 2D strokes for explicit structural control, with the VAE's decoder reconstructing the final high-quality 3D skeleton; and 2) Enhanced Mesh Synthesis via TextuRig and SKA-DPO, where we then synthesize a textured mesh conditioned on the generated skeleton. For this stage, we first enhance an existing skeleton-to-mesh model by augmenting its training data with TextuRig: a dataset of textured and rigged meshes with captions, curated from Objaverse-XL. Additionally, we employ a preference optimization strategy, SKA-DPO, guided by a skeleton-mesh alignment score, to further improve geometric fidelity. Together, our framework enables a more intuitive workflow for creating ready to animate 3D content. To the best of our knowledge, our work is the first to generate rigged 3D meshes conditioned on user-drawn 2D strokes. Extensive experiments demonstrate that Stroke3D produces plausible skeletons and high-quality meshes.
Problem

Research questions and friction points this paper is trying to address.

animatable geometry
rigged 3D models
skeleton creation
structural control
3D generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

latent diffusion models
skeletal graph generation
rigged 3D mesh
2D-to-3D lifting
preference optimization
🔎 Similar Papers
No similar papers found.