Progressive Autonomy as Preference Learning: A Formalization of Trust Calibration for Agentic Tool Use

📅 2026-05-18

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses the problem of trust calibration in autonomous agents—specifically, how an agent should dynamically decide whether to act independently or seek human approval when using automated tools. The paper formalizes this challenge as a preference learning task for the first time. It introduces a policy gateway that maintains a Gaussian process posterior over the human’s risk tolerance function, employing a probit likelihood and an approximate Gaussian process classification model to infer preferences from binary approve/reject feedback. The agent actively queries human input at points of highest uncertainty, thereby establishing a three-region decision mechanism: “allow,” “block,” and “ask.” This approach extends the applicability of preference-based Bayesian optimization and achieves sample-efficient trust calibration, accurately partitioning the action space while substantially reducing unnecessary human interventions.

📝 Abstract

We formalize trust calibration for agentic tool use (deciding when an automated agent's proposed action may execute autonomously versus require human approval) as a preference-learning problem. A policy gateway maintains a Gaussian-process posterior over a latent human risk-tolerance function, observed through a probit likelihood on binary approve/deny feedback, and escalates to the human exactly where the approval outcome is most uncertain. We show this is structurally an instance of Preferential Bayesian Optimization, inheriting its inference machinery (approximate Gaussian-process classification) and its sample-efficiency argument (uncertainty-targeted querying), while differing in objective: classifying an action space into allow/block/ask regions rather than optimizing a design.

Problem

Research questions and friction points this paper is trying to address.

trust calibration

agentic tool use

preference learning

autonomous decision-making

human-in-the-loop

Innovation

Methods, ideas, or system contributions that make the work stand out.

trust calibration

preference learning

Gaussian process classification