MeshMamba: State Space Models for Articulated 3D Mesh Generation and Reconstruction

📅 2025-07-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of generating and reconstructing high-resolution (>10K vertices), clothed, and hand-detailed articulated 3D human meshes. We propose MeshMamba, the first framework to introduce the Mamba state-space model into 3D articulated human modeling. It features a structure-aware vertex serialization strategy—based on anatomical body parts and template spatial coordinates—that enables efficient modeling of meshes with tens of thousands of vertices. The framework comprises two core models: (i) MambaDiff3D, a diffusion-based generative model that outperforms prior methods in synthesizing densely clothed and hand-posed meshes; and (ii) Mamba-HMR, a single-image-to-3D reconstruction model achieving state-of-the-art accuracy in full-body recovery—including facial geometry, hand articulation, and clothing—while operating at near-real-time inference speed. This work advances non-parametric, full-body mesh reconstruction toward practical deployment.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce MeshMamba, a neural network model for learning 3D articulated mesh models by employing the recently proposed Mamba State Space Models (Mamba-SSMs). MeshMamba is efficient and scalable in handling a large number of input tokens, enabling the generation and reconstruction of body mesh models with more than 10,000 vertices, capturing clothing and hand geometries. The key to effectively learning MeshMamba is the serialization technique of mesh vertices into orderings that are easily processed by Mamba. This is achieved by sorting the vertices based on body part annotations or the 3D vertex locations of a template mesh, such that the ordering respects the structure of articulated shapes. Based on MeshMamba, we design 1) MambaDiff3D, a denoising diffusion model for generating 3D articulated meshes and 2) Mamba-HMR, a 3D human mesh recovery model that reconstructs a human body shape and pose from a single image. Experimental results showed that MambaDiff3D can generate dense 3D human meshes in clothes, with grasping hands, etc., and outperforms previous approaches in the 3D human shape generation task. Additionally, Mamba-HMR extends the capabilities of previous non-parametric human mesh recovery approaches, which were limited to handling body-only poses using around 500 vertex tokens, to the whole-body setting with face and hands, while achieving competitive performance in (near) real-time.
Problem

Research questions and friction points this paper is trying to address.

Generating 3D articulated meshes with clothing and hand details
Reconstructing human body shape and pose from single images
Handling large-scale mesh data efficiently with Mamba-SSMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Mamba-SSMs for 3D mesh generation
Serializes vertices via body part annotations
Extends to whole-body reconstruction efficiently
🔎 Similar Papers
No similar papers found.
Y
Yusuke Yoshiyasu
National Institute of Advanced Industrial Science and Technology (AIST)
Leyuan Sun
Leyuan Sun
Wuxi University
Embodied AI navigation3D vision
Ryusuke Sagawa
Ryusuke Sagawa
Unknown affiliation