Speedup Patch: Learning a Plug-and-Play Policy to Accelerate Embodied Manipulation

📅 2026-03-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing embodied manipulation policies often suffer from inefficiency due to mimicking the temporal rhythm of human demonstrations, while current acceleration methods typically require policy retraining or costly online interactions, limiting their scalability. This work proposes Speedup Patch (SuP), a lightweight, policy-agnostic, plug-and-play acceleration framework that leverages only offline data to adaptively downsample redundant action segments via an external scheduler. SuP achieves, for the first time, general-purpose acceleration in a purely offline setting without policy retraining. It introduces a novel safety proxy based on state deviations predicted by a world model and formulates scheduler optimization as a constrained Markov decision process (CMDP). Experiments demonstrate that SuP achieves an average speedup of 1.8× across Libero and Bigym simulation benchmarks as well as real-world tasks, while preserving the original task success rate.

Technology Category

Application Category

📝 Abstract
While current embodied policies exhibit remarkable manipulation skills, their execution remains unsatisfactorily slow as they inherit the tardy pacing of human demonstrations. Existing acceleration methods typically require policy retraining or costly online interactions, limiting their scalability for large-scale foundation models. In this paper, we propose Speedup Patch (SuP), a lightweight, policy-agnostic framework that enables plug-and-play acceleration using solely offline data. SuP introduces an external scheduler that adaptively downsamples action chunks provided by embodied policies to eliminate redundancies. Specifically, we formalize the optimization of our scheduler as a Constrained Markov Decision Process (CMDP) aimed at maximizing efficiency without compromising task performance. Since direct success evaluation is infeasible in offline settings, SuP introduces World Model based state deviation as a surrogate metric to enforce safety constraints. By leveraging a learned world model as a virtual evaluator to predict counterfactual trajectories, the scheduler can be optimized via offline reinforcement learning. Empirical results on simulation benchmarks (Libero, Bigym) and real-world tasks validate that SuP achieves an overall 1.8x execution speedup for diverse policies while maintaining their original success rates.
Problem

Research questions and friction points this paper is trying to address.

embodied manipulation
execution speedup
offline acceleration
policy efficiency
human demonstration pacing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Speedup Patch
offline reinforcement learning
Constrained MDP
world model
embodied manipulation
🔎 Similar Papers
No similar papers found.
Z
Zhichao Wu
National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Artificial Intelligence, Nanjing University, China
J
Junyin Ye
National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Artificial Intelligence, Nanjing University, China
Zhilong Zhang
Zhilong Zhang
Nanjing University
Reinforcement LearningDeep Learning
Yihao Sun
Yihao Sun
Mila, University of Montreal
Reinforcement LearningDeep Learning
Haoxin Lin
Haoxin Lin
Nanjing University
Reinforcement LearningRobotics
J
Jiaheng Luo
National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Artificial Intelligence, Nanjing University, China
H
Haoxiang Ren
National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Artificial Intelligence, Nanjing University, China
Lei Yuan
Lei Yuan
Nanjing University
Machine LearningReinforcement LearningMulti-Agent SystemsEmbodied AI
Yang Yu
Yang Yu
Professor, Nanjing University
Artificial IntelligenceReinforcement LearningEvolutionary Algorithms