How Many Tokens Do 3D Point Cloud Transformer Architectures Really Need?

📅 2025-11-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work challenges the common assumption that “more tokens yield higher performance” in 3D point cloud Transformers, addressing pervasive token redundancy. We propose GitMerge3D, a global-aware graph-structured token merging method: it leverages graph neural networks to model global geometric relationships among tokens, enabling hierarchical, structure-preserving token clustering and merging; and introduces a hierarchical attention mechanism to retain critical spatial-semantic features during compression. On semantic segmentation and reconstruction tasks, GitMerge3D achieves 90–95% token compression, substantially reducing training and inference memory footprint and computational cost, while maintaining state-of-the-art accuracy. This is the first systematic study to reveal high token redundancy in large-scale 3D Transformers, and it establishes a scalable, lightweight modeling paradigm for efficient 3D vision.

Technology Category

Application Category

📝 Abstract
Recent advances in 3D point cloud transformers have led to state-of-the-art results in tasks such as semantic segmentation and reconstruction. However, these models typically rely on dense token representations, incurring high computational and memory costs during training and inference. In this work, we present the finding that tokens are remarkably redundant, leading to substantial inefficiency. We introduce gitmerge3D, a globally informed graph token merging method that can reduce the token count by up to 90-95% while maintaining competitive performance. This finding challenges the prevailing assumption that more tokens inherently yield better performance and highlights that many current models are over-tokenized and under-optimized for scalability. We validate our method across multiple 3D vision tasks and show consistent improvements in computational efficiency. This work is the first to assess redundancy in large-scale 3D transformer models, providing insights into the development of more efficient 3D foundation architectures. Our code and checkpoints are publicly available at https://gitmerge3d.github.io
Problem

Research questions and friction points this paper is trying to address.

Reducing token redundancy in 3D transformers
Maintaining performance while cutting computational costs
Challenging over-tokenization in 3D vision models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Globally informed graph token merging method
Reduces token count by up to 90-95%
Maintains competitive performance across tasks
🔎 Similar Papers
No similar papers found.