🤖 AI Summary
Traditional deep learning and Transformer models fail to inherently encode the rotational and translational symmetries of 3D Euclidean space (SE(3)), resulting in poor sample efficiency and limited generalization. To address this, we present a systematic survey of SE(3)-equivariant neural networks for vision-based robotic learning and control, grounded in group representation theory and Lie algebra. Our work unifies the architectural evolution of such models across the full perception-decision-control stack. We introduce, for the first time, a comprehensive taxonomy of SE(3)-equivariant methods spanning imitation learning, reinforcement learning, and geometric control—highlighting their theoretical advantages in sample efficiency, cross-pose generalization, and physical consistency. The proposed framework integrates equivariant convolutions, SE(3)-Transformers, and multimodal robot learning paradigms, identifying key pathways toward improved robustness, data efficiency, and multimodal synergy.
📝 Abstract
Recent advances in deep learning and Transformers have driven major breakthroughs in robotics by employing techniques such as imitation learning, reinforcement learning, and LLM-based multimodal perception and decision-making. However, conventional deep learning and Transformer models often struggle to process data with inherent symmetries and invariances, typically relying on large datasets or extensive data augmentation. Equivariant neural networks overcome these limitations by explicitly integrating symmetry and invariance into their architectures, leading to improved efficiency and generalization. This tutorial survey reviews a wide range of equivariant deep learning and control methods for robotics, from classic to state-of-the-art, with a focus on SE(3)-equivariant models that leverage the natural 3D rotational and translational symmetries in visual robotic manipulation and control design. Using unified mathematical notation, we begin by reviewing key concepts from group theory, along with matrix Lie groups and Lie algebras. We then introduce foundational group-equivariant neural network design and show how the group-equivariance can be obtained through their structure. Next, we discuss the applications of SE(3)-equivariant neural networks in robotics in terms of imitation learning and reinforcement learning. The SE(3)-equivariant control design is also reviewed from the perspective of geometric control. Finally, we highlight the challenges and future directions of equivariant methods in developing more robust, sample-efficient, and multi-modal real-world robotic systems.