🤖 AI Summary
Autoregressive action prediction underperforms holistic generative methods in robotic manipulation due to limited contextual modeling and sequential bottlenecks.
Method: This paper proposes a bidirectionally extended autoregressive learning paradigm, employing a lightweight encoder-only architecture that leverages bidirectional context modeling and coarse-to-fine iterative decoding to expand a single-frame initial state into a full action sequence in logarithmic time complexity.
Contribution/Results: It establishes the first dense autoregressive policy framework enabling efficient inference, overcoming the inherent limitations of conventional unidirectional autoregression. Evaluated on multi-task robotic manipulation benchmarks, the method achieves state-of-the-art performance in sample efficiency, cross-task generalization, and inference speed—significantly outperforming mainstream generative policies.
📝 Abstract
Mainstream visuomotor policies predominantly rely on generative models for holistic action prediction, while current autoregressive policies, predicting the next token or chunk, have shown suboptimal results. This motivates a search for more effective learning methods to unleash the potential of autoregressive policies for robotic manipulation. This paper introduces a bidirectionally expanded learning approach, termed Dense Policy, to establish a new paradigm for autoregressive policies in action prediction. It employs a lightweight encoder-only architecture to iteratively unfold the action sequence from an initial single frame into the target sequence in a coarse-to-fine manner with logarithmic-time inference. Extensive experiments validate that our dense policy has superior autoregressive learning capabilities and can surpass existing holistic generative policies. Our policy, example data, and training code will be publicly available upon publication. Project page: https: //selen-suyue.github.io/DspNet/.