🤖 AI Summary
Existing equivariant methods suffer from high computational costs, reliance on single-modality inputs, and poor stability when combined with fast sampling, making it challenging to balance efficiency and performance. This work proposes E3Flow, a novel framework that unifies efficient rectified flow with multimodal SE(3)-equivariant learning for the first time. By leveraging spherical harmonics, E3Flow achieves strict SO(3) equivariance and introduces a Feature Enhancement Module (FEM) to dynamically fuse point cloud and image information. Evaluated on eight simulated tasks from MimicGen, E3Flow improves the average success rate by 3.12% over state-of-the-art methods while achieving a 7× speedup in inference. Its effectiveness is further validated through four real-world robotic experiments.
📝 Abstract
While existing equivariant methods enhance data efficiency, they suffer from high computational intensity, reliance on single-modality inputs, and instability when combined with fast-sampling methods. In this work, we propose E3Flow, a novel framework that addresses the critical limitations of equivariant diffusion policies. E3Flow overcomes these challenges, successfully unifying efficient rectified flow with stable, multi-modal equivariant learning for the first time. Our framework is built upon spherical harmonic representations to ensure rigorous SO(3) equivariance. We introduce a novel invariant Feature Enhancement Module (FEM) that dynamically fuses hybrid visual modalities (point clouds and images), injecting rich visual cues into the spherical harmonic features. We evaluate E3Flow on 8 manipulation tasks from the MimicGen and further conduct 4 real-world experiments to validate its effectiveness in physical environments. Simulation results show that E3Flow achieves a 3.12% improvement in average success rate over the state-of-the-art Spherical Diffusion Policy (SDP) while simultaneously delivering a 7x inference speedup. E3Flow thus demonstrates a new and highly effective trade-off between performance, efficiency, and data efficiency for robotic policy learning. Code: https://github.com/zql-kk/E3Flow.