A Flexible Field-Based Policy Learning Framework for Diverse Robotic Systems and Sensors

📅 2025-12-22

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses two key challenges in cross-robot visual-motor learning: poor category-level generalization for manipulation tasks and high heterogeneity across robotic platforms and sensors. We propose a modular field-driven policy learning framework. Methodologically, it integrates diffusion-based policy control with D3Fields-derived 3D semantic scene representations, and introduces a unified configuration layer enabling plug-and-play compatibility with diverse robotic arms (e.g., UR5, Franka) and heterogeneous depth cameras (e.g., Azure Kinect, RealSense). It further incorporates a low-latency control stack, multi-sensor calibration fusion, and teleoperation-assisted data collection. Our core contribution is the first semantic-field-mediated cross-platform policy transfer mechanism, enabling category-level generalization. In a block grasping-and-lifting task, the framework achieves 80% success rate with only 100 demonstrations—demonstrating significantly improved robustness and scalability across robots and sensing modalities.

Technology Category

Application Category

📝 Abstract

We present a cross robot visuomotor learning framework that integrates diffusion policy based control with 3D semantic scene representations from D3Fields to enable category level generalization in manipulation. Its modular design supports diverse robot camera configurations including UR5 arms with Microsoft Azure Kinect arrays and bimanual manipulators with Intel RealSense sensors through a low latency control stack and intuitive teleoperation. A unified configuration layer enables seamless switching between setups for flexible data collection training and evaluation. In a grasp and lift block task the framework achieved an 80 percent success rate after only 100 demonstration episodes demonstrating robust skill transfer between platforms and sensing modalities. This design paves the way for scalable real world studies in cross robotic generalization.

Problem

Research questions and friction points this paper is trying to address.

Enables category-level generalization in robotic manipulation tasks

Supports diverse robot and camera configurations flexibly

Achieves robust skill transfer across platforms and sensors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion policy with 3D semantic scene representations

Modular design supports diverse robot camera configurations

Unified configuration layer enables flexible data collection

🔎 Similar Papers

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey