PointTransformerX:Portable and Efficient 3D Point Cloud Processing without Sparse Algorithms

📅 2026-04-27

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the challenge of deploying 3D point cloud perception models on non-NVIDIA platforms by proposing the first fully native PyTorch implementation of a 3D point cloud Vision Transformer backbone, eliminating all custom CUDA operators and external dependencies. The architecture directly models spatial relationships through self-attention without explicit neighborhood construction, incorporates 3D-GS-RoPE for geometry-aware positional encoding, replaces sparse convolutions with linear projection for patch embedding, and introduces a lightweight feed-forward network alongside a dynamic attention window scaling mechanism. Evaluated on ScanNet, the model achieves 98.7% of PointTransformer V3’s accuracy while reducing parameters by 79.2%, accelerating inference by 1.6×, occupying only 253 MB of memory, and enabling seamless cross-platform execution on NVIDIA GPUs, AMD GPUs (via ROCm), and CPUs.

Technology Category

Application Category

📝 Abstract

3D point cloud perception remains tightly coupled to custom CUDA operators for spatial operations, limiting portability and efficiency on non-NVIDIA, AMD, and embedded hardware. We introduce PointTransformerX (PTX), a fully PyTorch-native vision transformer backbone for 3D point clouds, removing all custom CUDA operators and external libraries while retaining competitive accuracy. PTX introduces 3D-GS-RoPE, a rotary positional embedding that encodes 3D spatial relationships directly in self-attention without neighborhood construction, and further replaces sparse convolutional patch embedding with a linear projection. PTX explores inference-time scaling of attention windows to improve accuracy without retraining. With a redesigned feed-forward network, PTX achieves 98.7\% of PointTransformer V3's accuracy on ScanNet with 79.2\% fewer parameters and executing 1.6\times faster while requiring just 253 MB memory. PTX runs natively on NVIDIA GPUs, AMD GPUs (ROCm), and CPUs, providing an efficient and portable foundation for point cloud perception.

Problem

Research questions and friction points this paper is trying to address.

3D point cloud

portability

efficiency

CUDA operators

hardware compatibility

Innovation

Methods, ideas, or system contributions that make the work stand out.

PointTransformerX

3D-GS-RoPE

sparse-free