🤖 AI Summary
Autoregressive mesh generation models suffer from high inference latency and low efficiency due to requiring thousands to tens of thousands of sequential token predictions. To address this, we propose Multi-head Speculative Decoding, which parallelly predicts multiple candidate vertex/patch tokens under topological constraints, and integrates a lightweight geometric validator with a dynamic resampling mechanism to ensure geometric fidelity. Furthermore, we introduce a knowledge distillation–based decoding head training paradigm that jointly optimizes probability distribution alignment, significantly reducing validation overhead. Our method maintains identical topological validity and reconstruction quality—measured by Chamfer Distance (CD) and F-Score—while achieving an average 1.7× inference speedup across multiple benchmarks. This work establishes a scalable pathway toward real-time, high-fidelity 3D mesh generation.
📝 Abstract
Current auto-regressive models can generate high-quality, topologically precise meshes; however, they necessitate thousands-or even tens of thousands-of next-token predictions during inference, resulting in substantial latency. We introduce XSpecMesh, a quality-preserving acceleration method for auto-regressive mesh generation models. XSpecMesh employs a lightweight, multi-head speculative decoding scheme to predict multiple tokens in parallel within a single forward pass, thereby accelerating inference. We further propose a verification and resampling strategy: the backbone model verifies each predicted token and resamples any tokens that do not meet the quality criteria. In addition, we propose a distillation strategy that trains the lightweight decoding heads by distilling from the backbone model, encouraging their prediction distributions to align and improving the success rate of speculative predictions. Extensive experiments demonstrate that our method achieves a 1.7x speedup without sacrificing generation quality. Our code will be released.