XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

251K/year

🤖 AI Summary

Autoregressive mesh generation models suffer from high inference latency and low efficiency due to requiring thousands to tens of thousands of sequential token predictions. To address this, we propose Multi-head Speculative Decoding, which parallelly predicts multiple candidate vertex/patch tokens under topological constraints, and integrates a lightweight geometric validator with a dynamic resampling mechanism to ensure geometric fidelity. Furthermore, we introduce a knowledge distillation–based decoding head training paradigm that jointly optimizes probability distribution alignment, significantly reducing validation overhead. Our method maintains identical topological validity and reconstruction quality—measured by Chamfer Distance (CD) and F-Score—while achieving an average 1.7× inference speedup across multiple benchmarks. This work establishes a scalable pathway toward real-time, high-fidelity 3D mesh generation.

Technology Category

Application Category

📝 Abstract

Current auto-regressive models can generate high-quality, topologically precise meshes; however, they necessitate thousands-or even tens of thousands-of next-token predictions during inference, resulting in substantial latency. We introduce XSpecMesh, a quality-preserving acceleration method for auto-regressive mesh generation models. XSpecMesh employs a lightweight, multi-head speculative decoding scheme to predict multiple tokens in parallel within a single forward pass, thereby accelerating inference. We further propose a verification and resampling strategy: the backbone model verifies each predicted token and resamples any tokens that do not meet the quality criteria. In addition, we propose a distillation strategy that trains the lightweight decoding heads by distilling from the backbone model, encouraging their prediction distributions to align and improving the success rate of speculative predictions. Extensive experiments demonstrate that our method achieves a 1.7x speedup without sacrificing generation quality. Our code will be released.

Problem

Research questions and friction points this paper is trying to address.

Accelerate auto-regressive mesh generation with low latency

Maintain high-quality mesh output during fast generation

Use speculative decoding for parallel token prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-head speculative decoding for parallel token prediction

Verification and resampling to ensure quality criteria

Distillation strategy to align prediction distributions

🔎 Similar Papers

No similar papers found.