Fast-SAM3D: 3Dfy Anything in Images but Faster

πŸ“… 2026-02-05
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Although SAM3D enables open-world, single-view 3D reconstruction, its high inference latency hinders practical deployment. This work presents the first systematic analysis of its inference dynamics, revealing that the failure of generic acceleration methods stems from neglecting multi-level heterogeneity across modalities and processing stages. Building on this insight, we propose a training-free, heterogeneity-aware acceleration framework that integrates three core techniques: modality-aware step caching, joint spatiotemporal token pruning, and spectrum-aware token aggregation. Our approach achieves up to 2.67Γ— end-to-end speedup with negligible fidelity loss, establishing a new Pareto frontier for efficient single-view 3D generation.

Technology Category

Application Category

πŸ“ Abstract
SAM3D enables scalable, open-world 3D reconstruction from complex scenes, yet its deployment is hindered by prohibitive inference latency. In this work, we conduct the \textbf{first systematic investigation} into its inference dynamics, revealing that generic acceleration strategies are brittle in this context. We demonstrate that these failures stem from neglecting the pipeline's inherent multi-level \textbf{heterogeneity}: the kinematic distinctiveness between shape and layout, the intrinsic sparsity of texture refinement, and the spectral variance across geometries. To address this, we present \textbf{Fast-SAM3D}, a training-free framework that dynamically aligns computation with instantaneous generation complexity. Our approach integrates three heterogeneity-aware mechanisms: (1) \textit{Modality-Aware Step Caching} to decouple structural evolution from sensitive layout updates; (2) \textit{Joint Spatiotemporal Token Carving} to concentrate refinement on high-entropy regions; and (3) \textit{Spectral-Aware Token Aggregation} to adapt decoding resolution. Extensive experiments demonstrate that Fast-SAM3D delivers up to \textbf{2.67$\times$} end-to-end speedup with negligible fidelity loss, establishing a new Pareto frontier for efficient single-view 3D generation. Our code is released in https://github.com/wlfeng0509/Fast-SAM3D.
Problem

Research questions and friction points this paper is trying to address.

3D reconstruction
inference latency
open-world
single-view 3D generation
computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

heterogeneity-aware acceleration
modality-aware step caching
spatiotemporal token carving
spectral-aware token aggregation
training-free 3D reconstruction
πŸ”Ž Similar Papers
No similar papers found.