IFG: Internet-Scale Guidance for Functional Grasping Generation

📅 2025-11-12

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

Current large vision models excel at semantic understanding but lack fine-grained geometric perception, hindering precise, function-aware 3D grasping for dexterous robotic hands. To address this, we propose a fully automatic, annotation-free semantic grasping framework. First, internet-scale vision models guide force-closure grasp generation in simulation, establishing semantic-geometric correspondences. Second, knowledge distillation transfers the simulated grasp priors to a lightweight point-cloud diffusion model, enabling real-time prediction of functional-part-aligned grasp poses directly from single-frame RGB-D point clouds. Our approach is the first to jointly integrate global semantic reasoning and local geometric modeling—without any real-world human annotations—achieving high success rates for functional grasping in cluttered scenes. It significantly enhances generalization of dexterous manipulation in open, unstructured environments.

Technology Category

Application Category

📝 Abstract

Large Vision Models trained on internet-scale data have demonstrated strong capabilities in segmenting and semantically understanding object parts, even in cluttered, crowded scenes. However, while these models can direct a robot toward the general region of an object, they lack the geometric understanding required to precisely control dexterous robotic hands for 3D grasping. To overcome this, our key insight is to leverage simulation with a force-closure grasping generation pipeline that understands local geometries of the hand and object in the scene. Because this pipeline is slow and requires ground-truth observations, the resulting data is distilled into a diffusion model that operates in real-time on camera point clouds. By combining the global semantic understanding of internet-scale models with the geometric precision of a simulation-based locally-aware force-closure, our achieves high-performance semantic grasping without any manually collected training data. For visualizations of this please visit our website at https://ifgrasping.github.io/

Problem

Research questions and friction points this paper is trying to address.

Generating functional grasps using internet-scale vision models

Overcoming geometric limitations for dexterous robotic grasping

Combining semantic understanding with simulation-based geometric precision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging simulation for force-closure grasping generation

Distilling simulation data into real-time diffusion model

Combining semantic understanding with geometric precision

🔎 Similar Papers

No similar papers found.