🤖 AI Summary
To address the low inference efficiency of SE(3)-equivariant diffusion models in unstructured environments—and the fundamental trade-off between data efficiency and strict SE(3) equivariance—this work introduces rectified flow into the SE(3)-equivariant diffusion framework for the first time, yielding an efficient generative policy learning method. Our approach preserves exact SE(3) equivariance (i.e., invariance under rigid-body rotations and translations) while enabling high-fidelity trajectory generation in a single inference step. This substantially improves accuracy and practicality for long-horizon prediction. On simulation benchmarks, our method achieves superior performance with only one denoising step: it reduces error by 48.5% on the drawing task and by 21.9% on the rotating triangle task compared to baseline methods requiring 100 steps. Crucially, it maintains high data efficiency and robust generalization under arbitrary rigid-body transformations.
📝 Abstract
Robotic manipulation in unstructured environments requires the generation of robust and long-horizon trajectory-level policy with conditions of perceptual observations and benefits from the advantages of SE(3)-equivariant diffusion models that are data-efficient. However, these models suffer from the inference time costs. Inspired by the inference efficiency of rectified flows, we introduce the rectification to the SE(3)-diffusion models and propose the ReSeFlow, i.e., Rectifying SE(3)-Equivariant Policy Learning Flows, providing fast, geodesic-consistent, least-computational policy generation. Crucially, both components employ SE(3)-equivariant networks to preserve rotational and translational symmetry, enabling robust generalization under rigid-body motions. With the verification on the simulated benchmarks, we find that the proposed ReSeFlow with only one inference step can achieve better performance with lower geodesic distance than the baseline methods, achieving up to a 48.5% error reduction on the painting task and a 21.9% reduction on the rotating triangle task compared to the baseline's 100-step inference. This method takes advantages of both SE(3) equivariance and rectified flow and puts it forward for the real-world application of generative policy learning models with the data and inference efficiency.