🤖 AI Summary
Large language models (LLMs) exhibit hallucination in robotic planning, yielding high-confidence yet unsafe decisions—primarily because existing methods fail to disentangle epistemic uncertainty (arising from task clarity and familiarity) from aleatoric uncertainty. Method: We propose the first joint uncertainty estimation framework that orthogonally decomposes epistemic uncertainty into task clarity and task familiarity dimensions, jointly modeling them with aleatoric uncertainty. Our approach employs stochastic network distillation to capture environmental dynamics uncertainty and integrates an LLM-feature-driven multi-layer perceptron regression head to separately quantify all three uncertainty types. Contribution/Results: Evaluated on kitchen manipulation and tabletop rearrangement tasks, our method significantly improves the calibration between uncertainty estimates and actual execution success—reducing mean calibration error by 32.7%—thereby enabling safer, more robust LLM-based planning decisions.
📝 Abstract
Large language models (LLMs) demonstrate advanced reasoning abilities, enabling robots to understand natural language instructions and generate high-level plans with appropriate grounding. However, LLM hallucinations present a significant challenge, often leading to overconfident yet potentially misaligned or unsafe plans. While researchers have explored uncertainty estimation to improve the reliability of LLM-based planning, existing studies have not sufficiently differentiated between epistemic and intrinsic uncertainty, limiting the effectiveness of uncertainty esti- mation. In this paper, we present Combined Uncertainty estimation for Reliable Embodied planning (CURE), which decomposes the uncertainty into epistemic and intrinsic uncertainty, each estimated separately. Furthermore, epistemic uncertainty is subdivided into task clarity and task familiarity for more accurate evaluation. The overall uncertainty assessments are obtained using random network distillation and multi-layer perceptron regression heads driven by LLM features. We validated our approach in two distinct experimental settings: kitchen manipulation and tabletop rearrangement experiments. The results show that, compared to existing methods, our approach yields uncertainty estimates that are more closely aligned with the actual execution outcomes.