GestOS: Advanced Hand Gesture Interpretation via Large Language Models to control Any Type of Robot

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

To address the challenge of high-level collaborative control for heterogeneous robot teams in dynamic environments, this paper proposes a large language model (LLM)-driven gesture intention understanding framework. Methodologically, the system integrates lightweight visual perception with LLM-based semantic parsing to convert real-time hand poses into structured natural language descriptions, and employs a context-aware robot selection module to autonomously assign tasks to appropriate agents without explicit target specification. Our key contribution lies in transcending conventional gesture-to-command mapping paradigms by elevating gesture interaction to intent-driven, multi-robot coordination scheduling—a first in the field. Experimental results demonstrate that the framework enables flexible scalability and context-sensitive natural interaction in dynamic settings, significantly enhancing the intelligence and practical performance of human–robot collaboration.

Technology Category

Application Category

📝 Abstract

We present GestOS, a gesture-based operating system for high-level control of heterogeneous robot teams. Unlike prior systems that map gestures to fixed commands or single-agent actions, GestOS interprets hand gestures semantically and dynamically distributes tasks across multiple robots based on their capabilities, current state, and supported instruction sets. The system combines lightweight visual perception with large language model (LLM) reasoning: hand poses are converted into structured textual descriptions, which the LLM uses to infer intent and generate robot-specific commands. A robot selection module ensures that each gesture-triggered task is matched to the most suitable agent in real time. This architecture enables context-aware, adaptive control without requiring explicit user specification of targets or commands. By advancing gesture interaction from recognition to intelligent orchestration, GestOS supports scalable, flexible, and user-friendly collaboration with robotic systems in dynamic environments.

Problem

Research questions and friction points this paper is trying to address.

Interpreting hand gestures semantically for robot control

Dynamically distributing tasks across multiple heterogeneous robots

Enabling context-aware adaptive control without explicit user specification

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM interprets gestures for semantic intent

Dynamic task distribution across multiple robots

Real-time robot selection based on capabilities

🔎 Similar Papers

No similar papers found.