🤖 AI Summary
This paper introduces the first end-to-end framework for automatic generation of 3D wearable assets, addressing the challenge of text- or image-driven clothing and accessory synthesis that robustly adapts to arbitrary 3D human body shapes and poses. Methodologically: (1) it proposes a body-aligned generative paradigm that incorporates SMPL body parameters to guide multi-view diffusion; (2) it designs a ControlNet based on surface-coordinate projection to enable precise pose conditioning; and (3) it jointly enforces multi-view silhouette supervision and physical penetration correction to ensure geometric fidelity and physical plausibility. Experiments demonstrate that our method significantly outperforms prior approaches in prompt adherence, shape diversity, and garment-body alignment accuracy. The generated 3D wearables are high-fidelity, pose-adaptive, and directly usable for rendering and physics-based simulation.
📝 Abstract
While recent advancements have shown remarkable progress in general 3D shape generation models, the challenge of leveraging these approaches to automatically generate wearable 3D assets remains unexplored. To this end, we present BAG, a Body-aligned Asset Generation method to output 3D wearable asset that can be automatically dressed on given 3D human bodies. This is achived by controlling the 3D generation process using human body shape and pose information. Specifically, we first build a general single-image to consistent multiview image diffusion model, and train it on the large Objaverse dataset to achieve diversity and generalizability. Then we train a Controlnet to guide the multiview generator to produce body-aligned multiview images. The control signal utilizes the multiview 2D projections of the target human body, where pixel values represent the XYZ coordinates of the body surface in a canonical space. The body-conditioned multiview diffusion generates body-aligned multiview images, which are then fed into a native 3D diffusion model to produce the 3D shape of the asset. Finally, by recovering the similarity transformation using multiview silhouette supervision and addressing asset-body penetration with physics simulators, the 3D asset can be accurately fitted onto the target human body. Experimental results demonstrate significant advantages over existing methods in terms of image prompt-following capability, shape diversity, and shape quality. Our project page is available at https://bag-3d.github.io/.