AdaClearGrasp: Learning Adaptive Clearing for Zero-Shot Robust Dexterous Grasping in Densely Cluttered Environments

📅 2026-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of dexterous grasping in highly cluttered environments, where physical interference, visual occlusion, and unstable contacts often lead to grasp failure, while blind bin-picking poses safety risks. The authors propose AdaClearGrasp, a novel framework that, for the first time, enables language-guided, adaptive closed-loop decision-making between “grasp directly” and “clear-then-grasp” strategies. The approach integrates a pretrained vision-language model for high-level planning, employs a reinforcement learning policy conditioned on relative hand-object distance for dexterous execution, and incorporates visual feedback to support replanning. The study introduces Clutter-Bench, the first simulation benchmark with graded clutter levels, facilitating zero-shot generalization. Evaluated across 210 simulated and 18 real-world scenes, the method significantly improves grasp success rates, demonstrating robustness and strong generalization in densely cluttered settings.

Technology Category

Application Category

📝 Abstract
In densely cluttered environments, physical interference, visual occlusions, and unstable contacts often cause direct dexterous grasping to fail, while aggressive singulation strategies may compromise safety. Enabling robots to adaptively decide whether to clear surrounding objects or directly grasp the target is therefore crucial for robust manipulation. We propose AdaClearGrasp, a closed-loop decision-execution framework for adaptive clearing and zero-shot dexterous grasping in densely cluttered environments. The framework formulates manipulation as a controllable high-level decision process that determines whether to directly grasp the target or first clear surrounding objects. A pretrained vision-language model (VLM) interprets visual observations and language task descriptions to reason about grasp interference and generate a high-level planning skeleton, which invokes structured atomic skills through a unified action interface. For dexterous grasping, we train a reinforcement learning policy with a relative hand-object distance representation, enabling zero-shot generalization across diverse object geometries and physical properties. During execution, visual feedback monitors outcomes and triggers replanning upon failures, forming a closed-loop correction mechanism. To evaluate language-conditioned dexterous grasping in clutter, we introduce Clutter-Bench, the first simulation benchmark with graded clutter complexity. It includes seven target objects across three clutter levels, yielding 210 task scenarios. We further perform sim-to-real experiments on three objects under three clutter levels (18 scenarios). Results demonstrate that AdaClearGrasp significantly improves grasp success rates in densely cluttered environments. For more videos and code, please visit our project website: https://chenzixuan99.github.io/adaclear-grasp.github.io/.
Problem

Research questions and friction points this paper is trying to address.

dexterous grasping
cluttered environments
adaptive clearing
zero-shot manipulation
grasp interference
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive clearing
zero-shot dexterous grasping
vision-language model
closed-loop manipulation
Clutter-Bench
🔎 Similar Papers
No similar papers found.