SafeHumanoid: VLM-RAG-driven Control of Upper Body Impedance for Humanoid Robot

📅 2025-11-28

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Humanoid robots struggle to simultaneously ensure task performance and physical safety during dynamic human–robot interaction (HRI). Method: This paper proposes a semantics-driven upper-body impedance control framework. It integrates vision-language models (VLMs) with retrieval-augmented generation (RAG), leveraging first-person visual input and structured prompts for scene semantic understanding and human proximity estimation. Embedded scene matching and inverse kinematics modeling jointly enable end-to-end generation of joint-level stiffness, damping, and velocity commands. Results: Evaluated on desktop manipulation tasks, the system adaptively modulates impedance and motion trajectories, achieving high task success rates and safe physical contact in both human-present and human-absent scenarios. Although inference latency reaches up to 1.4 seconds, this work establishes the first semantics-grounded, context-aware impedance control paradigm—advancing trustworthiness in HRI.

Technology Category

Application Category

📝 Abstract

Safe and trustworthy Human Robot Interaction (HRI) requires robots not only to complete tasks but also to regulate impedance and speed according to scene context and human proximity. We present SafeHumanoid, an egocentric vision pipeline that links Vision Language Models (VLMs) with Retrieval-Augmented Generation (RAG) to schedule impedance and velocity parameters for a humanoid robot. Egocentric frames are processed by a structured VLM prompt, embedded and matched against a curated database of validated scenarios, and mapped to joint-level impedance commands via inverse kinematics. We evaluate the system on tabletop manipulation tasks with and without human presence, including wiping, object handovers, and liquid pouring. The results show that the pipeline adapts stiffness, damping, and speed profiles in a context-aware manner, maintaining task success while improving safety. Although current inference latency (up to 1.4 s) limits responsiveness in highly dynamic settings, SafeHumanoid demonstrates that semantic grounding of impedance control is a viable path toward safer, standard-compliant humanoid collaboration.

Problem

Research questions and friction points this paper is trying to address.

Adapting humanoid robot impedance and speed for safe human-robot interaction

Using vision-language models to schedule context-aware impedance parameters

Maintaining task success while improving safety through semantic control

Innovation

Methods, ideas, or system contributions that make the work stand out.

VLM-RAG pipeline schedules humanoid impedance parameters

Egocentric vision maps scenes to joint impedance commands

Semantic grounding enables context-aware stiffness and speed adaptation

🔎 Similar Papers

Optimizing Design and Control Methods for Using Collaborative Robots in Upper-Limb Rehabilitation