When the Inference Meets the Explicitness or Why Multimodality Can Make Us Forget About the Perfect Predictor

📅 2026-02-21
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of real-time accuracy in human-robot collaborative carrying tasks, where reliance solely on intention prediction is vulnerable to the inherent randomness of human behavior. The work compares implicit intention prediction systems—based on force and velocity—with explicit communication modalities such as buttons and voice commands, and further investigates hybrid strategies that integrate both approaches. Using a custom social robot, IVO, equipped with multimodal perception—including force sensors, LiDAR, enhanced velocity prediction, speech recognition, and button interfaces—experiments reveal that users prefer explicit interaction methods that are natural and fault-tolerant. Moreover, once prediction performance surpasses a certain threshold, further improvements yield diminishing returns in user experience. The optimal collaboration paradigm emerges from a balanced integration of predictive and explicit methods, effectively harmonizing task efficiency with user satisfaction.

Technology Category

Application Category

📝 Abstract
Although in the literature it is common to find predictors and inference systems that try to predict human intentions, the uncertainty of these models due to the randomness of human behavior has led some authors to start advocating the use of communication systems that explicitly elicit human intention. In this work, it is analyzed the use of four different communication systems with a human-robot collaborative object transportation task as experimental testbed: two intention predictors (one based on force prediction and another with an enhanced velocity prediction algorithm) and two explicit communication methods (a button interface and a voice-command recognition system). These systems were integrated into IVO, a custom mobile social robot equipped with force sensor to detect the force exchange between both agents and LiDAR to detect the environment. The collaborative task required transporting an object over a 5-7 meter distance with obstacles in the middle, demanding rapid decisions and precise physical coordination. 75 volunteers perform a total of 255 executions divided into three groups, testing inference systems in the first round, communication systems in the second, and the combined strategies in the third. The results show that, 1) once sufficient performance is achieved, the human no longer notices and positively assesses technical improvements; 2) the human prefers systems that are more natural to them even though they have higher failure rates; and 3) the preferred option is the right combination of both systems.
Problem

Research questions and friction points this paper is trying to address.

human-robot collaboration
intention prediction
explicit communication
multimodality
human intention
Innovation

Methods, ideas, or system contributions that make the work stand out.

human-robot collaboration
intention prediction
explicit communication
multimodal interaction
user preference
🔎 Similar Papers
No similar papers found.
J
J. E. DomĂ­nguez-Vidal
Institut de Robòtica i Informàtica Industrial (CSIC-UPC), Llorens i Artigas 4-6, Barcelona, 08028, Spain; Universitat Politècnica de Catalunya - BarcelonaTech (UPC), Jordi Girona 31, Barcelona, 08034, Spain
Alberto Sanfeliu
Alberto Sanfeliu
Full Professor, Universitat Politecnica de Catalunya & Institut de Robotica i Informatica Industrial
RoboticsArtificial intelligencePattern RecognitionHuman-Robot Interaction