GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Human demonstrations for visual-language-action (VLA) policies are often noisy and low-precision, hindering their application to high-dexterity, long-horizon manipulation tasks. Method: We propose a multi-stage specialized training framework: (1) trajectory filtering via offline reinforcement learning to remove suboptimal demonstrations; (2) morphological symmetry-aware data augmentation to improve generalization; and (3) latent-space noise prediction to guide online reinforcement learning for fine-grained control. The method integrates visual-language task-progress assessment, sparse-reward Q-function modeling, and symmetric trajectory augmentation. Contribution/Results: Our approach achieves the first autonomous, learning-based robotic shoelace-tying (83.3% success rate), demonstrating multi-hole threading, sub-millimeter positioning, and compliant soft-body interaction. It establishes the first systematic pipeline for adapting general-purpose VLA models into high-accuracy, task-specific policies.

Technology Category

Application Category

📝 Abstract
We present GR-RL, a robotic learning framework that turns a generalist vision-language-action (VLA) policy into a highly capable specialist for long-horizon dexterous manipulation. Assuming the optimality of human demonstrations is core to existing VLA policies. However, we claim that in highly dexterous and precise manipulation tasks, human demonstrations are noisy and suboptimal. GR-RL proposes a multi-stage training pipeline that filters, augments, and reinforces the demonstrations by reinforcement learning. First, GR-RL learns a vision-language-conditioned task progress, filters the demonstration trajectories, and only keeps the transitions that contribute positively to the progress. Specifically, we show that by directly applying offline RL with sparse reward, the resulting $Q$-values can be treated as a robust progress function. Next, we introduce morphological symmetry augmentation that greatly improves the generalization and performance of GR-RL. Lastly, to better align the VLA policy with its deployment behaviors for high-precision control, we perform online RL by learning a latent space noise predictor. With this pipeline, GR-RL is, to our knowledge, the first learning-based policy that can autonomously lace up a shoe by threading shoelaces through multiple eyelets with an 83.3% success rate, a task requiring long-horizon reasoning, millimeter-level precision, and compliant soft-body interaction. We hope GR-RL provides a step toward enabling generalist robot foundations models to specialize into reliable real-world experts.
Problem

Research questions and friction points this paper is trying to address.

Enhances dexterous manipulation with precision and long-horizon reasoning
Addresses suboptimal human demonstrations in complex robotic tasks
Improves generalization and control alignment for real-world specialization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Filters human demonstrations using offline RL progress function
Applies morphological symmetry augmentation for generalization
Performs online RL with latent noise predictor for precision
🔎 Similar Papers
No similar papers found.
Yunfei Li
Yunfei Li
ByteDance Seed
Reinforcement LearningRobotics
X
Xiao Ma
ByteDance Seed
J
Jiafeng Xu
ByteDance Seed
Y
Yu Cui
ByteDance Seed
Z
Zhongren Cui
ByteDance Seed
Z
Zhigang Han
ByteDance Seed
L
Liqun Huang
ByteDance Seed
Tao Kong
Tao Kong
ByteDance Research
Robot Foundation ModelRobot LearningComputer Vision
Yuxiao Liu
Yuxiao Liu
ShanghaiTech University
fMRIneuroscienceNLPLarge Language Model
Hao Niu
Hao Niu
KDDI Research, Inc.
Machine Learning
W
Wanli Peng
ByteDance Seed
J
Jingchao Qiao
ByteDance Seed
Z
Zeyu Ren
ByteDance Seed
Haixin Shi
Haixin Shi
Bytedance Research
RoboticsComputer VisionAugmented RealityMachine Learning
Z
Zhi Su
ByteDance Seed
J
Jiawen Tian
ByteDance Seed
Y
Yuyang Xiao
ByteDance Seed
Shenyu Zhang
Shenyu Zhang
Southeast University
Natural Language Processing
L
Liwei Zheng
ByteDance Seed
H
Hang Li
ByteDance Seed
Y
Yonghui Wu
ByteDance Seed