Mesh4D: 4D Mesh Reconstruction and Tracking from Monocular Video

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of reconstructing complete 4D meshes—comprising 3D geometry and its temporal dynamics—from monocular videos of dynamic objects. To this end, the authors propose a feedforward 4D reconstruction framework that leverages a skeletal structure to guide the learning of a compact latent space, while integrating spatiotemporal attention mechanisms with an implicit diffusion model. This design enables one-shot reconstruction of the entire sequence without requiring skeletal information during inference. By combining an autoencoder architecture, deformation field representations, and diffusion-based generative modeling, the method significantly outperforms existing approaches in both 4D reconstruction and novel view synthesis, achieving superior recovery of complex dynamic deformations and fine geometric details.

Technology Category

Application Category

📝 Abstract
We propose Mesh4D, a feed-forward model for monocular 4D mesh reconstruction. Given a monocular video of a dynamic object, our model reconstructs the object's complete 3D shape and motion, represented as a deformation field. Our key contribution is a compact latent space that encodes the entire animation sequence in a single pass. This latent space is learned by an autoencoder that, during training, is guided by the skeletal structure of the training objects, providing strong priors on plausible deformations. Crucially, skeletal information is not required at inference time. The encoder employs spatio-temporal attention, yielding a more stable representation of the object's overall deformation. Building on this representation, we train a latent diffusion model that, conditioned on the input video and the mesh reconstructed from the first frame, predicts the full animation in one shot. We evaluate Mesh4D on reconstruction and novel view synthesis benchmarks, outperforming prior methods in recovering accurate 3D shape and deformation.
Problem

Research questions and friction points this paper is trying to address.

4D mesh reconstruction
monocular video
dynamic object
shape and motion
deformation tracking
Innovation

Methods, ideas, or system contributions that make the work stand out.

4D mesh reconstruction
latent diffusion model
spatio-temporal attention
monocular video
deformation field
🔎 Similar Papers
No similar papers found.