CoRI: Synthesizing Communication of Robot Intent for Physical Human-Robot Interaction

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

In physical human-robot interaction (pHRI), insufficient natural language expression of robot intent impedes user comprehension and collaborative efficiency. To address this, we propose the first task-agnostic, vision-language-driven intent communication framework. Our method integrates human pose estimation with 3D trajectory visual encoding, leveraging a multimodal vision-language model (VLM) to jointly infer high-level intent, motion dynamics, and required user collaborative actions—generating context-adaptive natural language descriptions. Unlike prior approaches relying on handcrafted rules or task-specific templates, ours enables end-to-end, cross-task generalizable intent–language alignment. Evaluated on real-world assistive tasks—including feeding, bathing, and shaving—our framework significantly improves communication clarity (p < 0.01) and increases user accuracy in interpreting both robot intent and required collaboration by 42%.

Technology Category

Application Category

📝 Abstract

Clear communication of robot intent fosters transparency and interpretability in physical human-robot interaction (pHRI), particularly during assistive tasks involving direct human-robot contact. We introduce CoRI, a pipeline that automatically generates natural language communication of a robot's upcoming actions directly from its motion plan and visual perception. Our pipeline first processes the robot's image view to identify human poses and key environmental features. It then encodes the planned 3D spatial trajectory (including velocity and force) onto this view, visually grounding the path and its dynamics. CoRI queries a vision-language model with this visual representation to interpret the planned action within the visual context before generating concise, user-directed statements, without relying on task-specific information. Results from a user study involving robot-assisted feeding, bathing, and shaving tasks across two different robots indicate that CoRI leads to statistically significant difference in communication clarity compared to a baseline communication strategy. Specifically, CoRI effectively conveys not only the robot's high-level intentions but also crucial details about its motion and any collaborative user action needed.

Problem

Research questions and friction points this paper is trying to address.

Enhancing robot intent clarity in physical human-robot interaction

Automating natural language communication from motion plans

Improving user understanding of robot actions and collaboration needs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates natural language from motion plans

Uses vision-language model for context interpretation

Encodes spatial trajectory with visual perception

🔎 Similar Papers

A Survey of Language-Based Communication in Robotics

2024-06-06arXiv.orgCitations: 5

Field AI

Boston

Robotics Autonomy Engineer-Planning and Control

Field AI

Irvine, CA

AI Research Scientist, Robotics