SeedPolicy: Horizon Scaling via Self-Evolving Diffusion Policy for Robot Manipulation

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion policies suffer significant performance degradation in long-horizon robotic imitation learning as the observation horizon increases. To address this limitation, this work proposes the Self-Evolving Gated Attention (SEGA) module, which recursively updates a time-varying latent state to compress long-horizon observations into a fixed-dimensional representation while filtering out redundant information. SEGA is compatible with both CNN and Transformer backbones and supports end-to-end training, effectively overcoming the temporal modeling bottleneck of diffusion policies with minimal parameter overhead. Evaluated on 50 tasks in RoboTwin 2.0, the method achieves an average performance gain of 36.8% in clean environments and a remarkable 169% improvement under challenging randomized conditions, matching the performance of billion-scale vision-language-action models with one to two orders of magnitude fewer parameters.

Technology Category

Application Category

📝 Abstract
Imitation Learning (IL) enables robots to acquire manipulation skills from expert demonstrations. Diffusion Policy (DP) models multi-modal expert behaviors but suffers performance degradation as observation horizons increase, limiting long-horizon manipulation. We propose Self-Evolving Gated Attention (SEGA), a temporal module that maintains a time-evolving latent state via gated attention, enabling efficient recurrent updates that compress long-horizon observations into a fixed-size representation while filtering irrelevant temporal information. Integrating SEGA into DP yields Self-Evolving Diffusion Policy (SeedPolicy), which resolves the temporal modeling bottleneck and enables scalable horizon extension with moderate overhead. On the RoboTwin 2.0 benchmark with 50 manipulation tasks, SeedPolicy outperforms DP and other IL baselines. Averaged across both CNN and Transformer backbones, SeedPolicy achieves 36.8% relative improvement in clean settings and 169% relative improvement in randomized challenging settings over the DP. Compared to vision-language-action models such as RDT with 1.2B parameters, SeedPolicy achieves competitive performance with one to two orders of magnitude fewer parameters, demonstrating strong efficiency and scalability. These results establish SeedPolicy as a state-of-the-art imitation learning method for long-horizon robotic manipulation. Code is available at: https://github.com/Youqiang-Gui/SeedPolicy.
Problem

Research questions and friction points this paper is trying to address.

Imitation Learning
Diffusion Policy
Long-horizon Manipulation
Observation Horizon
Robotic Manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Evolving Gated Attention
Diffusion Policy
Long-horizon Manipulation
Imitation Learning
Temporal Modeling
🔎 Similar Papers
No similar papers found.
Y
Youqiang Gui
Sichuan University
Y
Yuxuan Zhou
Independent Researcher
Shen Cheng
Shen Cheng
Megvii Research
Deep Learning
X
Xinyang Yuan
Sichuan University
Haoqiang Fan
Haoqiang Fan
Megvii
computer vision
P
Peng Cheng
Sichuan University
Shuaicheng Liu
Shuaicheng Liu
University of Electronic Science and Technology of China
Computer VisionComputational Photography