🤖 AI Summary
Diffusion-based policies suffer from poor generalization and low sample efficiency in robotic control. To address this, we propose a lightweight symmetry-incorporation method that avoids the complexity of fully equivariant networks. Our approach leverages two key insights: (i) theoretical proof that “eye-in-hand” perception combined with relative action parameterization is inherently SE(3)-invariant; and (ii) a synergistic fusion mechanism integrating frame averaging with an equivariant visual encoder, coupled with relative trajectory representation and pretrained feature extraction. The resulting architecture remains computationally lightweight and conceptually simple, yet matches or surpasses fully equivariant baselines in performance. Empirically, it achieves significant improvements in both generalization—across diverse robot poses and unseen environments—and sample efficiency—requiring fewer environment interactions to reach comparable policy performance.
📝 Abstract
Recently, equivariant neural networks for policy learning have shown promising improvements in sample efficiency and generalization, however, their wide adoption faces substantial barriers due to implementation complexity. Equivariant architectures typically require specialized mathematical formulations and custom network design, posing significant challenges when integrating with modern policy frameworks like diffusion-based models. In this paper, we explore a number of straightforward and practical approaches to incorporate symmetry benefits into diffusion policies without the overhead of full equivariant designs. Specifically, we investigate (i) invariant representations via relative trajectory actions and eye-in-hand perception, (ii) integrating equivariant vision encoders, and (iii) symmetric feature extraction with pretrained encoders using Frame Averaging. We first prove that combining eye-in-hand perception with relative or delta action parameterization yields inherent SE(3)-invariance, thus improving policy generalization. We then perform a systematic experimental study on those design choices for integrating symmetry in diffusion policies, and conclude that an invariant representation with equivariant feature extraction significantly improves the policy performance. Our method achieves performance on par with or exceeding fully equivariant architectures while greatly simplifying implementation.