Fast-SAM3D: 3Dfy Anything in Images but Faster

📅 2026-02-05

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Although SAM3D enables open-world, single-view 3D reconstruction, its high inference latency hinders practical deployment. This work presents the first systematic analysis of its inference dynamics, revealing that the failure of generic acceleration methods stems from neglecting multi-level heterogeneity across modalities and processing stages. Building on this insight, we propose a training-free, heterogeneity-aware acceleration framework that integrates three core techniques: modality-aware step caching, joint spatiotemporal token pruning, and spectrum-aware token aggregation. Our approach achieves up to 2.67× end-to-end speedup with negligible fidelity loss, establishing a new Pareto frontier for efficient single-view 3D generation.

Technology Category

Application Category

📝 Abstract

SAM3D enables scalable, open-world 3D reconstruction from complex scenes, yet its deployment is hindered by prohibitive inference latency. In this work, we conduct the \textbf{first systematic investigation} into its inference dynamics, revealing that generic acceleration strategies are brittle in this context. We demonstrate that these failures stem from neglecting the pipeline's inherent multi-level \textbf{heterogeneity}: the kinematic distinctiveness between shape and layout, the intrinsic sparsity of texture refinement, and the spectral variance across geometries. To address this, we present \textbf{Fast-SAM3D}, a training-free framework that dynamically aligns computation with instantaneous generation complexity. Our approach integrates three heterogeneity-aware mechanisms: (1) \textit{Modality-Aware Step Caching} to decouple structural evolution from sensitive layout updates; (2) \textit{Joint Spatiotemporal Token Carving} to concentrate refinement on high-entropy regions; and (3) \textit{Spectral-Aware Token Aggregation} to adapt decoding resolution. Extensive experiments demonstrate that Fast-SAM3D delivers up to \textbf{2.67$\times$} end-to-end speedup with negligible fidelity loss, establishing a new Pareto frontier for efficient single-view 3D generation. Our code is released in https://github.com/wlfeng0509/Fast-SAM3D.

Problem

Research questions and friction points this paper is trying to address.

3D reconstruction

inference latency

open-world

single-view 3D generation

computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

heterogeneity-aware acceleration

modality-aware step caching

spatiotemporal token carving