🤖 AI Summary
This work addresses the challenge of efficiently generating high-fidelity, animatable 3D face meshes from a single image without requiring test-time optimization or multi-view inputs. The authors propose a feed-forward, single-pass reconstruction framework that jointly models geometry and texture through a shared Transformer backbone. Central to their approach are an innovative iterative GRU-based decoding mechanism and a reprojection-guided texture refinement strategy, which together enhance appearance consistency while preserving mesh topological integrity. Employing a dual-branch shape-texture architecture, the method significantly outperforms existing approaches, achieving state-of-the-art performance in terms of reconstruction quality, animation capability, and inference efficiency.
📝 Abstract
We introduce MeshLAM, a feed-forward framework for one-shot animatable mesh head reconstruction that generates high-fidelity, animatable 3D head avatars from a single image. Unlike previous work that relies on time-consuming test-time optimization or extensive multi-view data, our method produces complete mesh representations with inherent animatability from a single image in a single forward pass. Our approach employs a dual shape and texture map architecture that simultaneously processes mesh vertices and texture map with extracted image features from a shared transformer backbone, allowing for coherent shape carving and appearance modeling. To prevent mesh collapse and ensure topological integrity during feed-forward deformation, we propose an iterative GRU-based decoding mechanism with progressive geometry deformation and texture refinement, coupled with a novel reprojection-based texture guidance mechanism that anchors appearance learning to the input image. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches in reconstruction quality, animation capability, and computational efficiency. Project page at https://meshlam.github.io.