Masked Generative Policy for Robotic Control

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To address unreliable control and slow inference in visual-motor imitation learning for complex non-Markovian tasks, this paper proposes the Dual-Paradigm Masked Generation and Refinement framework (MGP). MGP comprises two complementary variants: MGP-Short enables parallel token generation with score-guided iterative refinement of low-confidence tokens; MGP-Long supports single-shot global trajectory prediction coupled with observation-driven dynamic regeneration. We introduce three key innovations: (1) discrete action tokenization, (2) conditional masked Transformer modeling, and (3) observation-adaptive refinement—unifying global trajectory consistency with strong environmental adaptability for the first time. Evaluated on 150 Meta-World and LIBERO tasks, MGP achieves a +9% average success rate improvement and up to 35× faster inference. Under dynamic or occluded conditions, success rates improve by 60%. Moreover, MGP is the first method to systematically resolve two fundamental non-Markovian manipulation challenges.

Technology Category

Application Category

📝 Abstract

We present Masked Generative Policy (MGP), a novel framework for visuomotor imitation learning. We represent actions as discrete tokens, and train a conditional masked transformer that generates tokens in parallel and then rapidly refines only low-confidence tokens. We further propose two new sampling paradigms: MGP-Short, which performs parallel masked generation with score-based refinement for Markovian tasks, and MGP-Long, which predicts full trajectories in a single pass and dynamically refines low-confidence action tokens based on new observations. With globally coherent prediction and robust adaptive execution capabilities, MGP-Long enables reliable control on complex and non-Markovian tasks that prior methods struggle with. Extensive evaluations on 150 robotic manipulation tasks spanning the Meta-World and LIBERO benchmarks show that MGP achieves both rapid inference and superior success rates compared to state-of-the-art diffusion and autoregressive policies. Specifically, MGP increases the average success rate by 9% across 150 tasks while cutting per-sequence inference time by up to 35x. It further improves the average success rate by 60% in dynamic and missing-observation environments, and solves two non-Markovian scenarios where other state-of-the-art methods fail.

Problem

Research questions and friction points this paper is trying to address.

Develops a framework for visuomotor imitation learning in robotics

Enables reliable control on complex and non-Markovian robotic tasks

Improves success rates and inference speed in robotic manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Masked transformer generates parallel action tokens

Score-based refinement targets low-confidence tokens

Dynamic trajectory prediction adapts to new observations

🔎 Similar Papers

Make-An-Agent: A Generalizable Policy Network Generator with Behavior-Prompted Diffusion

2024-07-15Neural Information Processing SystemsCitations: 2

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Robotic Control Policy (PhD)