Canonical Policy: Learning Canonical 3D Representation for Equivariant Policy

📅 2025-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the generalization bottleneck of visual imitation learning across diverse objects, scene layouts, and camera viewpoints, this paper proposes an SE(3)-equivariant policy framework grounded in canonical 3D representations. Methodologically, it integrates SE(3)-equivariant neural networks, generative policy modeling, and multi-configuration cross-domain transfer training. Key contributions include: (i) the first rigorous theoretical formulation of a canonical 3D point cloud space satisfying strict SE(3) equivariance; (ii) explicit decoupling of geometric canonicalization from policy learning, balancing interpretability and representational capacity; and (iii) unified normalization for both in-distribution and out-of-distribution point clouds. Evaluated on 12 simulated and 4 real-robot tasks (16 total configurations), the method achieves average performance gains of 18.0% (simulation) and 37.6% (real-world) over state-of-the-art approaches, demonstrating significantly improved generalization and sample efficiency.

Technology Category

Application Category

📝 Abstract
Visual Imitation learning has achieved remarkable progress in robotic manipulation, yet generalization to unseen objects, scene layouts, and camera viewpoints remains a key challenge. Recent advances address this by using 3D point clouds, which provide geometry-aware, appearance-invariant representations, and by incorporating equivariance into policy architectures to exploit spatial symmetries. However, existing equivariant approaches often lack interpretability and rigor due to unstructured integration of equivariant components. We introduce canonical policy, a principled framework for 3D equivariant imitation learning that unifies 3D point cloud observations under a canonical representation. We first establish a theory of 3D canonical representations, enabling equivariant observation-to-action mappings by grouping both in-distribution and out-of-distribution point clouds to a canonical representation. We then propose a flexible policy learning pipeline that leverages geometric symmetries from canonical representation and the expressiveness of modern generative models. We validate canonical policy on 12 diverse simulated tasks and 4 real-world manipulation tasks across 16 configurations, involving variations in object color, shape, camera viewpoint, and robot platform. Compared to state-of-the-art imitation learning policies, canonical policy achieves an average improvement of 18.0% in simulation and 37.6% in real-world experiments, demonstrating superior generalization capability and sample efficiency. For more details, please refer to the project website: https://zhangzhiyuanzhang.github.io/cp-website/.
Problem

Research questions and friction points this paper is trying to address.

Generalizing robotic manipulation to unseen objects and viewpoints
Lack of interpretability in equivariant policy architectures
Unifying 3D point clouds under canonical representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D canonical representations for equivariance
Leverages geometric symmetries in policy learning
Combines point clouds with generative models
🔎 Similar Papers
No similar papers found.
Z
Zhiyuan Zhang
Purdue University, West Lafayette, USA
Zhengtong Xu
Zhengtong Xu
PhD candidate at Purdue University
Robot Learning
J
Jai Nanda Lakamsani
Purdue University, West Lafayette, USA
Yu She
Yu She
Assistant Professor, Purdue University
Robotic ManipulationMechanism DesignTactile SensingRobot Learning