MOMO: A framework for seamless physical, verbal, and graphical robot skill learning and adaptation

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

This study addresses the challenge that industrial robots lack multimodal, safe, and flexible skill adaptation mechanisms accessible to non-expert users, hindering their ability to cope with task and environmental variations. To bridge this gap, the work proposes an interactive framework integrating kinesthetic teaching, natural language instructions, and a graphical user interface, enabling intuitive skill adjustment and generalization through three complementary modalities. Key innovations include the incorporation of a tool-augmented large language model for secure semantic adaptation, the first extension of kernelized movement primitives (KMP) to ergodic control to support voice-driven surface machining, and the integration of energy-based intent recognition with probabilistic virtual fixtures to enhance interaction safety. The framework’s practicality and effectiveness in industrial settings were validated on a 7-degree-of-freedom force-controlled robot at the Automatica 2025 exhibition.

Technology Category

Application Category

📝 Abstract

Industrial robot applications require increasingly flexible systems that non-expert users can easily adapt for varying tasks and environments. However, different adaptations benefit from different interaction modalities. We present an interactive framework that enables robot skill adaptation through three complementary modalities: kinesthetic touch for precise spatial corrections, natural language for high-level semantic modifications, and a graphical web interface for visualizing geometric relations and trajectories, inspecting and adjusting parameters, and editing via-points by drag-and-drop. The framework integrates five components: energy-based human-intention detection, a tool-based LLM architecture (where the LLM selects and parameterizes predefined functions rather than generating code) for safe natural language adaptation, Kernelized Movement Primitives (KMPs) for motion encoding, probabilistic Virtual Fixtures for guided demonstration recording, and ergodic control for surface finishing. We demonstrate that this tool-based LLM architecture generalizes skill adaptation from KMPs to ergodic control, enabling voice-commanded surface finishing. Validation on a 7-DoF torque-controlled robot at the Automatica 2025 trade fair demonstrates the practical applicability of our approach in industrial settings.

Problem

Research questions and friction points this paper is trying to address.

robot skill adaptation

multi-modal interaction

non-expert users

industrial robotics

human-robot collaboration

Innovation

Methods, ideas, or system contributions that make the work stand out.

tool-based LLM

Kernelized Movement Primitives

ergodic control