🤖 AI Summary
Diffusion policies for robotic manipulation suffer from inefficient training due to repeated learning of 3D spatial representations. To address this, we propose hPGA-DP—a hybrid diffusion policy integrating Projective Geometric Algebra (PGA). This work is the first to incorporate PGA into diffusion-based control, introducing an E(3)-equivariant P-GATr backbone that jointly leverages U-Net and Transformer architectures to balance geometric prior modeling and denoising performance. By explicitly encoding geometric inductive biases—such as translations and rotations—hPGA-DP significantly enhances spatial reasoning capability and training efficiency. Evaluations on both simulation and real-robot platforms demonstrate that hPGA-DP achieves a 12.7% absolute improvement in task success rate and accelerates convergence by 41% compared to the pure P-GATr baseline. Our approach establishes a new paradigm for geometry-aware embodied intelligence, bridging principled geometric representation with scalable diffusion-based policy learning.
📝 Abstract
Diffusion policies have become increasingly popular in robot learning due to their reliable convergence in motion generation tasks. At a high level, these policies learn to transform noisy action trajectories into effective ones, conditioned on observations. However, each time such a model is trained in a robotics context, the network must relearn fundamental spatial representations and operations, such as translations and rotations, from scratch in order to ground itself and operate effectively in a 3D environment. Incorporating geometric inductive biases directly into the network can alleviate this redundancy and substantially improve training efficiency. In this paper, we introduce hPGA-DP, a diffusion policy approach that integrates a mathematical framework called Projective Geometric Algebra (PGA) to embed strong geometric inductive biases. PGA is particularly well-suited for this purpose as it provides a unified algebraic framework that naturally encodes geometric primitives, such as points, directions, and rotations, enabling neural networks to reason about spatial structure through interpretable and composable operations. Specifically, we propose a novel diffusion policy architecture that incorporates the Projective Geometric Algebra Transformer (P-GATr), leveraging its E(3)-equivariant properties established in prior work. Our approach adopts a hybrid architecture strategy, using P-GATr as both a state encoder and action decoder, while employing U-Net or Transformer-based modules for the denoising process. Several experiments and ablation studies in both simulated and real-world environments demonstrate that hPGA-DP not only improves task performance and training efficiency through the geometric bias of P-GATr, but also achieves substantially faster convergence through its hybrid model compared to architectures that rely solely on P-GATr.