Actron3D: Learning Actionable Neural Functions from Videos for Transferable Robotic Manipulation

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the challenge of enabling robots to learn transferable 6-DoF object manipulation skills from only a few uncalibrated, monocular RGB videos of human demonstrations. We propose a Neural Affordance Function that jointly models geometric, visual, and functional cues, yielding a lightweight and reusable manipulation skill memory. Our method employs a multimodal feature continuous-query mechanism to achieve end-to-end mapping from affordance representations to precise manipulation policies, supporting both coarse-to-fine and cross-task/cross-environment policy transfer. Evaluated on 13 simulated and real-world manipulation tasks, our approach achieves an average success rate improvement of 14.9 percentage points over prior methods, requiring only 2–3 demonstration videos per task. The framework demonstrates strong generalization across objects, environments, and task variants while maintaining computational efficiency and scalability.

Technology Category

Application Category

📝 Abstract

We present Actron3D, a framework that enables robots to acquire transferable 6-DoF manipulation skills from just a few monocular, uncalibrated, RGB-only human videos. At its core lies the Neural Affordance Function, a compact object-centric representation that distills actionable cues from diverse uncalibrated videos-geometry, visual appearance, and affordance-into a lightweight neural network, forming a memory bank of manipulation skills. During deployment, we adopt a pipeline that retrieves relevant affordance functions and transfers precise 6-DoF manipulation policies via coarse-to-fine optimization, enabled by continuous queries to the multimodal features encoded in the neural functions. Experiments in both simulation and the real world demonstrate that Actron3D significantly outperforms prior methods, achieving a 14.9 percentage point improvement in average success rate across 13 tasks while requiring only 2-3 demonstration videos per task.

Problem

Research questions and friction points this paper is trying to address.

Learning transferable robotic manipulation skills from few human videos

Distilling geometry, appearance and affordance into neural representations

Transferring precise 6-DoF policies via coarse-to-fine optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning actionable neural functions from videos

Using Neural Affordance Function for object-centric representation

Transferring policies via coarse-to-fine optimization queries

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey