🤖 AI Summary
To address weak representational capacity and lack of task unification in 3D shape understanding and generation, this paper introduces Shape Tokens: a continuous, compact, and geometrically interpretable 3D shape representation. For the first time, 3D shapes are encoded as differentiable, generalizable tokens that serve as conditional vectors within a flow-matching generative framework—unifying 3D generation, image-to-3D reconstruction, cross-modal alignment, and variable-resolution rendering. Methodologically, we integrate delta-function-based density approximation, decoupled geometric attribute fields (normal, density, deformation), and a token-conditioned diffusion architecture. Experiments demonstrate consistent superiority over state-of-the-art baselines in generation fidelity, cross-modal alignment accuracy, and rendering flexibility—enabling real-time, resolution-adaptive rendering and systematic geometric analysis.
📝 Abstract
We introduce Shape Tokens, a 3D representation that is continuous, compact, and easy to incorporate into machine learning models. Shape Tokens act as conditioning vectors that represent shape information in a 3D flow-matching model. The flow-matching model is trained to approximate probability density functions corresponding to delta functions concentrated on the surfaces of shapes in 3D. By attaching Shape Tokens to various machine learning models, we can generate new shapes, convert images to 3D, align 3D shapes with text and images, and render shapes directly at variable, user specified, resolution. Moreover, Shape Tokens enable a systematic analysis of geometric properties such as normal, density, and deformation field. Across all tasks and experiments, utilizing Shape Tokens demonstrate strong performance compared to existing baselines.