🤖 AI Summary
This study addresses the pervasive overconfidence of large language model (LLM) agents in task execution—evidenced by a stark discrepancy between predicted success rates (e.g., 77%) and actual performance (e.g., 22%). The authors systematically evaluate the agents’ uncertainty estimation capabilities across three phases: before, during, and after task execution. Surprisingly, pre-execution assessments, despite limited information, demonstrate superior discriminative performance compared to conventional post-hoc analyses. To mitigate overconfidence, the work proposes an adversarial prompting strategy that reframes success prediction as a vulnerability detection task, integrating probabilistic estimation with calibration metrics. Experimental results across diverse tasks show that this approach achieves state-of-the-art calibration performance, effectively reducing the agents’ overconfidence bias.
📝 Abstract
Can AI agents predict whether they will succeed at a task? We study agentic uncertainty by eliciting success probability estimates before, during, and after task execution. All results exhibit agentic overconfidence: some agents that succeed only 22% of the time predict 77% success. Counterintuitively, pre-execution assessment with strictly less information tends to yield better discrimination than standard post-execution review, though differences are not always significant. Adversarial prompting reframing assessment as bug-finding achieves the best calibration.