Toward Human-Robot Teaming: Learning Handover Behaviors from 3D Scenes

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

This work addresses the challenge of learning object handover policies from monocular RGB images alone in human-robot collaboration, without requiring real-robot data collection or physical experimentation. The proposed end-to-end method bridges simulation and reality via cross-domain policy transfer. Its core innovation is the first application of sparse-view Gaussian splatting for 3D reconstruction of hand-object handover scenes; reconstructed geometry is leveraged to map virtual camera pose changes—derived from human demonstration videos—into executable robotic arm motion commands. The framework jointly integrates 3D scene reconstruction, visual imitation learning, and egocentric viewpoint transformation to directly learn human handover behavior from single-view RGB video. Extensive evaluation on both simulated and real robotic platforms demonstrates robust grasping performance and consistent hand-collision avoidance, significantly improving the robustness and naturalness of handover execution.

Technology Category

Application Category

📝 Abstract

Human-robot teaming (HRT) systems often rely on large-scale datasets of human and robot interactions, espe- cially for close-proximity collaboration tasks such as human- robot handovers. Learning robot manipulation policies from raw, real-world image data requires a large number of robot- action trials in the physical environment. Although simulation training offers a cost-effective alternative, the visual domain gap between simulation and robot workspace remains a major limitation. We introduce a method for training HRT policies, focusing on human-to-robot handovers, solely from RGB images without the need for real-robot training or real-robot data collection. The goal is to enable the robot to reliably receive objects from a human with stable grasping while avoiding collisions with the human hand. The proposed policy learner leverages sparse-view Gaussian Splatting reconstruction of human-to-robot handover scenes to generate robot demonstra- tions containing image-action pairs captured with a camera mounted on the robot gripper. As a result, the simulated camera pose changes in the reconstructed scene can be directly translated into gripper pose changes. Experiments in both Gaussian Splatting reconstructed scene and real-world human- to-robot handover experiments demonstrate that our method serves as a new and effective representation for the human-to- robot handover task, contributing to more seamless and robust HRT.

Problem

Research questions and friction points this paper is trying to address.

Learning robot handover policies from RGB images

Bridging visual domain gap between simulation and reality

Enabling stable grasping and collision-free human-robot handovers

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Gaussian Splatting for scene reconstruction

Trains policies from RGB images only

Simulates camera pose for gripper control

🔎 Similar Papers

A Multimodal Handover Failure Detection Dataset and Baselines