🤖 AI Summary
The integration of large language models (LLMs) into human–robot interaction (HRI) lacks systematic empirical validation regarding trade-offs between subjective experience and objective performance metrics. Method: This study conducts a controlled, dual-track comparison between scripted and LLM-augmented industrial robots across real-world HRI tasks—approach, command interpretation, and object manipulation—using eye-tracking, NASA-TLX, and Godspeed questionnaires to jointly quantify subjective perception (e.g., engagement, anthropomorphism) and objective metrics (latency, energy consumption, task efficiency). Contribution/Results: LLM augmentation significantly enhances user engagement and perceived anthropomorphism; however, scripted control outperforms LLM-based control in task completion efficiency, visual attention focus, response latency, and energy efficiency—especially for structured, repetitive tasks. The study empirically delineates the operational boundaries of LLMs in HRI and provides evidence-based guidance for selecting interaction paradigms under resource constraints.
📝 Abstract
To achieve natural and intuitive interaction with people, HRI frameworks combine a wide array of methods for human perception, intention communication, human-aware navigation and collaborative action. In practice, when encountering unpredictable behavior of people or unexpected states of the environment, these frameworks may lack the ability to dynamically recognize such states, adapt and recover to resume the interaction. Large Language Models (LLMs), owing to their advanced reasoning capabilities and context retention, present a promising solution for enhancing robot adaptability. This potential, however, may not directly translate to improved interaction metrics. This paper considers a representative interaction with an industrial robot involving approach, instruction, and object manipulation, implemented in two conditions: (1) fully scripted and (2) including LLM-enhanced responses. We use gaze tracking and questionnaires to measure the participants' task efficiency, engagement, and robot perception. The results indicate higher subjective ratings for the LLM condition, but objective metrics show that the scripted condition performs comparably, particularly in efficiency and focus during simple tasks. We also note that the scripted condition may have an edge over LLM-enhanced responses in terms of response latency and energy consumption, especially for trivial and repetitive interactions.