🤖 AI Summary
Existing SE(3)-equivariant convolutional layers for 3D point clouds struggle to simultaneously satisfy continuity, locality, and computational scalability in multi-object scenes. Method: We propose the first group convolutional layer that is rigorously local and continuous-domain SE(3)-equivariant—departing from conventional discretization and global rotational symmetry assumptions. Our approach constructs adaptive local reference frames, employs continuous kernel parameterization, and performs SE(3)-equivariant feature alignment and interpolation. Contribution/Results: The resulting layer guarantees strict theoretical SE(3) equivariance while significantly enhancing modeling flexibility and computational feasibility. It achieves state-of-the-art performance on 3D object classification and semantic segmentation benchmarks, with negligible inference overhead—substantially outperforming prior discrete or globally equivariant SE(3) methods.
📝 Abstract
Extending the translation equivariance property of convolutional neural networks to larger symmetry groups has been shown to reduce sample complexity and enable more discriminative feature learning. Further, exploiting additional symmetries facilitates greater weight sharing than standard convolutions, leading to an enhanced network expressivity without an increase in parameter count. However, extending the equivariant properties of a convolution layer comes at a computational cost. In particular, for 3D data, expanding equivariance to the SE(3) group (rotation and translation) results in a 6D convolution operation, which is not tractable for larger data samples such as 3D scene scans. While efforts have been made to develop efficient SE(3) equivariant networks, existing approaches rely on discretization or only introduce global rotation equivariance. This limits their applicability to point clouds representing a scene composed of multiple objects. This work presents an efficient, continuous, and local SE(3) equivariant convolution layer for point cloud processing based on general group convolution and local reference frames. Our experiments show that our approach achieves competitive or superior performance across a range of datasets and tasks, including object classification and semantic segmentation, with negligible computational overhead.