🤖 AI Summary
Humanoid robots struggle to simultaneously ensure task performance and physical safety during dynamic human–robot interaction (HRI).
Method: This paper proposes a semantics-driven upper-body impedance control framework. It integrates vision-language models (VLMs) with retrieval-augmented generation (RAG), leveraging first-person visual input and structured prompts for scene semantic understanding and human proximity estimation. Embedded scene matching and inverse kinematics modeling jointly enable end-to-end generation of joint-level stiffness, damping, and velocity commands.
Results: Evaluated on desktop manipulation tasks, the system adaptively modulates impedance and motion trajectories, achieving high task success rates and safe physical contact in both human-present and human-absent scenarios. Although inference latency reaches up to 1.4 seconds, this work establishes the first semantics-grounded, context-aware impedance control paradigm—advancing trustworthiness in HRI.
📝 Abstract
Safe and trustworthy Human Robot Interaction (HRI) requires robots not only to complete tasks but also to regulate impedance and speed according to scene context and human proximity. We present SafeHumanoid, an egocentric vision pipeline that links Vision Language Models (VLMs) with Retrieval-Augmented Generation (RAG) to schedule impedance and velocity parameters for a humanoid robot. Egocentric frames are processed by a structured VLM prompt, embedded and matched against a curated database of validated scenarios, and mapped to joint-level impedance commands via inverse kinematics. We evaluate the system on tabletop manipulation tasks with and without human presence, including wiping, object handovers, and liquid pouring. The results show that the pipeline adapts stiffness, damping, and speed profiles in a context-aware manner, maintaining task success while improving safety. Although current inference latency (up to 1.4 s) limits responsiveness in highly dynamic settings, SafeHumanoid demonstrates that semantic grounding of impedance control is a viable path toward safer, standard-compliant humanoid collaboration.