Masked Generative Policy for Robotic Control

📅 2025-12-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address unreliable control and slow inference in visual-motor imitation learning for complex non-Markovian tasks, this paper proposes the Dual-Paradigm Masked Generation and Refinement framework (MGP). MGP comprises two complementary variants: MGP-Short enables parallel token generation with score-guided iterative refinement of low-confidence tokens; MGP-Long supports single-shot global trajectory prediction coupled with observation-driven dynamic regeneration. We introduce three key innovations: (1) discrete action tokenization, (2) conditional masked Transformer modeling, and (3) observation-adaptive refinement—unifying global trajectory consistency with strong environmental adaptability for the first time. Evaluated on 150 Meta-World and LIBERO tasks, MGP achieves a +9% average success rate improvement and up to 35× faster inference. Under dynamic or occluded conditions, success rates improve by 60%. Moreover, MGP is the first method to systematically resolve two fundamental non-Markovian manipulation challenges.

Technology Category

Application Category

📝 Abstract
We present Masked Generative Policy (MGP), a novel framework for visuomotor imitation learning. We represent actions as discrete tokens, and train a conditional masked transformer that generates tokens in parallel and then rapidly refines only low-confidence tokens. We further propose two new sampling paradigms: MGP-Short, which performs parallel masked generation with score-based refinement for Markovian tasks, and MGP-Long, which predicts full trajectories in a single pass and dynamically refines low-confidence action tokens based on new observations. With globally coherent prediction and robust adaptive execution capabilities, MGP-Long enables reliable control on complex and non-Markovian tasks that prior methods struggle with. Extensive evaluations on 150 robotic manipulation tasks spanning the Meta-World and LIBERO benchmarks show that MGP achieves both rapid inference and superior success rates compared to state-of-the-art diffusion and autoregressive policies. Specifically, MGP increases the average success rate by 9% across 150 tasks while cutting per-sequence inference time by up to 35x. It further improves the average success rate by 60% in dynamic and missing-observation environments, and solves two non-Markovian scenarios where other state-of-the-art methods fail.
Problem

Research questions and friction points this paper is trying to address.

Develops a framework for visuomotor imitation learning in robotics
Enables reliable control on complex and non-Markovian robotic tasks
Improves success rates and inference speed in robotic manipulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Masked transformer generates parallel action tokens
Score-based refinement targets low-confidence tokens
Dynamic trajectory prediction adapts to new observations
🔎 Similar Papers
No similar papers found.
Lipeng Zhuang
Lipeng Zhuang
University of Glasgow
deformable object manipulationrobotic perception and manipulationcomputer vision
Shiyu Fan
Shiyu Fan
University of Glasgow
Computer Graphics
F
Florent P. Audonnet
University of Glasgow, Glasgow, United Kingdom
Y
Yingdong Ru
University of Glasgow, Glasgow, United Kingdom
G
Gerardo Aragon Camarasa
University of Glasgow, Glasgow, United Kingdom
Paul Henderson
Paul Henderson
University of Glasgow
computer visionmachine learning