When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This study addresses the challenge of planning failures in large language model–based multi-agent systems caused by agents’ misjudgment of their own knowledge—a phenomenon termed “cognitive miscalibration.” The work is the first to identify and formally characterize this dynamic, latent issue and introduces the Evidential Planning Calibration Agent Workflow (EPC-AW). EPC-AW enhances collaborative reliability by evaluating the stability of plans under varying informational conditions and integrating information-consistency–driven plan selection with cognitive state refinement. A historical bias feedback mechanism dynamically adjusts agents’ cognitive states, enabling cross-agent calibration. Empirical evaluation across multiple task scenarios demonstrates an average 9.75% improvement in system success rate, establishing a novel paradigm for robust multi-agent planning.

📝 Abstract

LLM-based multi-agent systems can fail even when planned actions are executed correctly because agents may misjudge their knowledge when evaluating plan feasibility, a phenomenon we term epistemic miscalibration in planning. Unlike execution errors, epistemic miscalibration is latent during planning, as generated plans can remain self-consistent and executable without observable errors; the miscalibration is also dynamic, as new information can alter feasibility assessments, potentially obscuring past miscalibration signals and causing them to recur over time. To address this, we propose the Epistemic Planning Calibration Agentic Workflow (EPC-AW), which assesses whether plans remain supported under varying information conditions rather than directly verifying feasibility. EPC-AW employs Information-consistency-based Plan Selection, selecting plans whose evaluations are stable across agents, together with Consistency-guided Epistemic State Refinement to adapt calibration over time by leveraging past discrepancies to guide future planning. Experiments show that EPC-AW improves system-level success by an average of 9.75%.

Problem

Research questions and friction points this paper is trying to address.

epistemic miscalibration

LLM-based multi-agent systems

plan feasibility

cognitive calibration

planning failure

Innovation

Methods, ideas, or system contributions that make the work stand out.

epistemic calibration

LLM-based multi-agent systems

planning feasibility