🤖 AI Summary
To address the weak geometric interpretability and high computational overhead of implicit representations (e.g., PointNet) in 3D recognition, this paper proposes the 3D Gaussian Point Encoder—a fully differentiable, end-to-end learnable explicit geometric representation. It models point cloud structure via a mixture of 3D Gaussians, employs natural gradient optimization to mitigate training instability, and integrates PointNet knowledge distillation with a 3D Gaussian splatting filtering mechanism to enhance convergence and efficiency. Additionally, computational geometry acceleration strategies are embedded into the Mamba3D architecture. Experiments demonstrate that, compared to PointNet, our method achieves a 2.7× inference speedup, 46% memory reduction, and 88% FLOPs decrease; within Mamba3D, it delivers a 1.27× speedup alongside 42% memory and 54% FLOPs reductions—enabling, for the first time, high-frame-rate 3D recognition on CPU-only hardware.
📝 Abstract
In this work, we introduce the 3D Gaussian Point Encoder, an explicit per-point embedding built on mixtures of learned 3D Gaussians. This explicit geometric representation for 3D recognition tasks is a departure from widely used implicit representations such as PointNet. However, it is difficult to learn 3D Gaussian encoders in end-to-end fashion with standard optimizers. We develop optimization techniques based on natural gradients and distillation from PointNets to find a Gaussian Basis that can reconstruct PointNet activations. The resulting 3D Gaussian Point Encoders are faster and more parameter efficient than traditional PointNets. As in the 3D reconstruction literature where there has been considerable interest in the move from implicit (e.g., NeRF) to explicit (e.g., Gaussian Splatting) representations, we can take advantage of computational geometry heuristics to accelerate 3D Gaussian Point Encoders further. We extend filtering techniques from 3D Gaussian Splatting to construct encoders that run 2.7 times faster as a comparable accuracy PointNet while using 46% less memory and 88% fewer FLOPs. Furthermore, we demonstrate the effectiveness of 3D Gaussian Point Encoders as a component in Mamba3D, running 1.27 times faster and achieving a reduction in memory and FLOPs by 42% and 54% respectively. 3D Gaussian Point Encoders are lightweight enough to achieve high framerates on CPU-only devices.