🤖 AI Summary
This study addresses the problem that humans often fail to accurately determine whether to accept AI recommendations in human–AI collaboration, undermining complementary performance. To resolve this, we propose a confidence-based joint human–AI inference method: during robotic teleoperation, both human operators and AI generate decisions alongside associated confidence scores, with the system selecting the higher-confidence judgment. We are the first to apply the maximum-confidence heuristic to simulated teleoperation and systematically demonstrate that AI confidence calibration critically modulates collaborative efficacy. Through developing a calibrated AI decision-support system, designing delay-sensitive control tasks, and conducting user studies, we find that well-calibrated AI confidence significantly improves joint inference accuracy, whereas miscalibration degrades overall performance. Our results establish metacognitive sensitivity—the human operator’s ability to discern the relative reliability of their own versus the AI’s judgments—as a fundamental prerequisite for effective human–AI complementarity.
📝 Abstract
Joint human-AI inference holds immense potential to improve outcomes in human-supervised robot missions. Current day missions are generally in the AI-assisted setting, where the human operator makes the final inference based on the AI recommendation. However, due to failures in human judgement on when to accept or reject the AI recommendation, complementarity is rarely achieved. We investigate joint human-AI inference where the inference made with higher confidence is selected. Through a user study with N=100 participants on a representative simulated robot teleoperation task, specifically studying the inference of robots' control delays we show that: a) Joint inference accuracy is higher and its extent is regulated by the confidence calibration of the AI agent, and b) Humans change their inferences based on AI recommendations and the extent and direction of this change is also regulated by the confidence calibration of the AI agent. Interestingly, our results show that pairing poorly-calibrated AI-DSS with humans hurts performance instead of helping the team, reiterating the need for AI-based decision support systems with good metacognitive sensitivity. To the best of our knowledge, our study presents the first application of a maximum-confidence-based heuristic for joint human-AI inference within a simulated robot teleoperation task.