HMR-1: Hierarchical Massage Robot with Vision-Language-Model for Embodied Healthcare

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This study addresses the limitations in current intelligent physiotherapy robotics, which stem from the absence of standardized embodied medical evaluation benchmarks and open-source multimodal acupoint massage datasets. To bridge this gap, the authors introduce MedMassage-12K, a large-scale multimodal dataset for acupoint massage, and propose a hierarchical embodied massage framework. At the high level, a multimodal large language model (e.g., Qwen-VL) enables language-guided semantic understanding and localization of acupoints; at the low level, precise massage trajectories are generated accordingly. This work establishes the first evaluation benchmark for embodied massage tasks and demonstrates through experiments that the system can accurately execute massage operations in response to natural language instructions. Both the dataset and code have been publicly released.

Technology Category

Application Category

📝 Abstract

The rapid advancement of Embodied Intelligence has opened transformative opportunities in healthcare, particularly in physical therapy and rehabilitation. However, critical challenges remain in developing robust embodied healthcare solutions, such as the lack of standardized evaluation benchmarks and the scarcity of open-source multimodal acupoint massage datasets. To address these gaps, we construct MedMassage-12K - a multimodal dataset containing 12,190 images with 174,177 QA pairs, covering diverse lighting conditions and backgrounds. Furthermore, we propose a hierarchical embodied massage framework, which includes a high-level acupoint grounding module and a low-level control module. The high-level acupoint grounding module uses multimodal large language models to understand human language and identify acupoint locations, while the low-level control module provides the planned trajectory. Based on this, we evaluate existing MLLMs and establish a benchmark for embodied massage tasks. Additionally, we fine-tune the Qwen-VL model, demonstrating the framework's effectiveness. Physical experiments further confirm the practical applicability of the framework.Our dataset and code are publicly available at https://github.com/Xiaofeng-Han-Res/HMR-1.

Problem

Research questions and friction points this paper is trying to address.

Embodied Intelligence

Healthcare

Acupoint Massage

Multimodal Dataset

Evaluation Benchmark

Innovation

Methods, ideas, or system contributions that make the work stand out.

Embodied Intelligence

Vision-Language Model

Acupoint Grounding