iFlame: Interleaving Full and Linear Attention for Efficient Mesh Generation

📅 2025-03-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generating high-resolution 3D meshes via Transformers faces a fundamental trade-off: full attention achieves strong long-range modeling but incurs prohibitive O(N²) computational complexity, whereas linear attention scales efficiently yet suffers from degraded global context awareness. Method: We propose iFlame—the first alternating Transformer framework that synergistically integrates full and linear attention modules. Contribution/Results: (1) An alternating stack of full- and linear-attention blocks balances representational capacity and efficiency; (2) A novel “hourglass” architecture coupled with KV cache compression accelerates training and doubles inference speed while reducing KV memory consumption by 87.5%. Evaluated on ShapeNet and Objaverse, iFlame trains 39K high-fidelity meshes (up to 4K faces) in two days using four GPUs, matching the generation quality of full-attention baselines while significantly reducing GPU memory footprint and runtime.

Technology Category

Application Category

📝 Abstract
This paper propose iFlame, a novel transformer-based network architecture for mesh generation. While attention-based models have demonstrated remarkable performance in mesh generation, their quadratic computational complexity limits scalability, particularly for high-resolution 3D data. Conversely, linear attention mechanisms offer lower computational costs but often struggle to capture long-range dependencies, resulting in suboptimal outcomes. To address this trade-off, we propose an interleaving autoregressive mesh generation framework that combines the efficiency of linear attention with the expressive power of full attention mechanisms. To further enhance efficiency and leverage the inherent structure of mesh representations, we integrate this interleaving approach into an hourglass architecture, which significantly boosts efficiency. Our approach reduces training time while achieving performance comparable to pure attention-based models. To improve inference efficiency, we implemented a caching algorithm that almost doubles the speed and reduces the KV cache size by seven-eighths compared to the original Transformer. We evaluate our framework on ShapeNet and Objaverse, demonstrating its ability to generate high-quality 3D meshes efficiently. Our results indicate that the proposed interleaving framework effectively balances computational efficiency and generative performance, making it a practical solution for mesh generation. The training takes only 2 days with 4 GPUs on 39k data with a maximum of 4k faces on Objaverse.
Problem

Research questions and friction points this paper is trying to address.

Balancing computational efficiency and generative performance in mesh generation
Combining full and linear attention for scalable 3D mesh creation
Reducing training time and KV cache size while maintaining quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interleaving full and linear attention mechanisms
Hourglass architecture for enhanced efficiency
Caching algorithm for faster inference
🔎 Similar Papers
No similar papers found.