FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real

📅 2025-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of zero-shot, collision-free robotic grasping of target objects in cluttered shelf environments. We propose FetchBot, a novel framework that introduces a dynamics-aware simulation-to-reality (sim2real) reinforcement learning distillation paradigm. FetchBot integrates voxel-based synthetic scene generation, multi-view depth estimation leveraging foundation models (e.g., Marigold), and a multi-view Transformer-based visual policy to bridge the sim2real gap in both texture rendering and dynamic modeling. Evaluated on a Franka Emika robotic arm, FetchBot achieves over 92% grasp success under unseen objects, poses, and occlusion levels. Crucially, its collision-free success rate improves by 37% over state-of-the-art methods, demonstrating significantly enhanced robustness and safety in constrained spaces, under heavy occlusion, and amid dynamic uncertainty.

Technology Category

Application Category

📝 Abstract
Object fetching from cluttered shelves is an important capability for robots to assist humans in real-world scenarios. Achieving this task demands robotic behaviors that prioritize safety by minimizing disturbances to surrounding objects, an essential but highly challenging requirement due to restricted motion space, limited fields of view, and complex object dynamics. In this paper, we introduce FetchBot, a sim-to-real framework designed to enable zero-shot generalizable and safety-aware object fetching from cluttered shelves in real-world settings. To address data scarcity, we propose an efficient voxel-based method for generating diverse simulated cluttered shelf scenes at scale and train a dynamics-aware reinforcement learning (RL) policy to generate object fetching trajectories within these scenes. This RL policy, which leverages oracle information, is subsequently distilled into a vision-based policy for real-world deployment. Considering that sim-to-real discrepancies stem from texture variations mostly while from geometric dimensions rarely, we propose to adopt depth information estimated by full-fledged depth foundation models as the input for the vision-based policy to mitigate sim-to-real gap. To tackle the challenge of limited views, we design a novel architecture for learning multi-view representations, allowing for comprehensive encoding of cluttered shelf scenes. This enables FetchBot to effectively minimize collisions while fetching objects from varying positions and depths, ensuring robust and safety-aware operation. Both simulation and real-robot experiments demonstrate FetchBot's superior generalization ability, particularly in handling a broad range of real-world scenarios, includ
Problem

Research questions and friction points this paper is trying to address.

Develop sim-to-real robot for cluttered shelves
Ensure safety and minimize object disturbances
Generalize across diverse real-world scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Voxel-based simulated cluttered shelf scenes
Depth foundation models for vision policy
Multi-view representations for limited views
🔎 Similar Papers
No similar papers found.
W
Weiheng Liu
Institute of Automation, Chinese Academy of Sciences; Beijing Academy of Artificial Intelligence
Y
Yuxuan Wan
CFCS, School of Computer Science, Peking University; Beijing Academy of Artificial Intelligence
J
Jilong Wang
CFCS, School of Computer Science, Peking University; Beijing Academy of Artificial Intelligence
Yuxuan Kuang
Yuxuan Kuang
Carnegie Mellon University
Robotics3D Computer VisionMachine Learning
Xuesong Shi
Xuesong Shi
Galbot
robotic visionheterogeneous computinggraph signal processingSLAM
H
Haoran Li
Institute of Automation, Chinese Academy of Sciences
Dongbin Zhao
Dongbin Zhao
Institute of Automation, Chinese Academy of Sciences
Deep Reinforcement LearningAdaptive Dynamic ProgrammingGame AISmart drivingrobotics
Z
Zhizheng Zhang
Beijing Academy of Artificial Intelligence
H
He Wang
CFCS, School of Computer Science, Peking University; Beijing Academy of Artificial Intelligence