DAGSM: Disentangled Avatar Generation with GS-enhanced Mesh

📅 2024-11-20

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Existing text-driven virtual human generation methods jointly model the human body and clothing, limiting fine-grained editing (e.g., garment replacement) and photorealistic cloth dynamics. This paper proposes the first text-driven *disentangled* virtual human generation framework: it separately models the body, top garment, and bottom garment as Gaussian Splatting–enhanced meshes (GSMs), enabling semantic-controllable, stage-wise generation. To enhance texture fidelity and stylistic consistency, we introduce a cross-view attention-based texture refinement module and an incident-angle-weighted diffusion denoising (IAW-DE) strategy. Experiments demonstrate that our method significantly outperforms state-of-the-art baselines in visual quality, garment editability, and dynamic cloth realism.

Technology Category

Application Category

📝 Abstract

Text-driven avatar generation has gained significant attention owing to its convenience. However, existing methods typically model the human body with all garments as a single 3D model, limiting its usability, such as clothing replacement, and reducing user control over the generation process. To overcome the limitations above, we propose DAGSM, a novel pipeline that generates disentangled human bodies and garments from the given text prompts. Specifically, we model each part (e.g., body, upper/lower clothes) of the clothed human as one GS-enhanced mesh (GSM), which is a traditional mesh attached with 2D Gaussians to better handle complicated textures (e.g., woolen, translucent clothes) and produce realistic cloth animations. During the generation, we first create the unclothed body, followed by a sequence of individual cloth generation based on the body, where we introduce a semantic-based algorithm to achieve better human-cloth and garment-garment separation. To improve texture quality, we propose a view-consistent texture refinement module, including a cross-view attention mechanism for texture style consistency and an incident-angle-weighted denoising (IAW-DE) strategy to update the appearance. Extensive experiments have demonstrated that DAGSM generates high-quality disentangled avatars, supports clothing replacement and realistic animation, and outperforms the baselines in visual quality.

Problem

Research questions and friction points this paper is trying to address.

Generates disentangled human bodies and garments from text prompts.

Improves texture quality with view-consistent refinement and denoising.

Enables clothing replacement and realistic animation in avatar generation.

Innovation

Methods, ideas, or system contributions that make the work stand out.

GS-enhanced mesh for realistic textures

Semantic-based algorithm for garment separation

View-consistent texture refinement module

🔎 Similar Papers

LAGA: Layered 3D Avatar Generation and Customization via Gaussian Splatting