🤖 AI Summary
Existing AI agents face a confirmation-timing dilemma in multi-step tasks: end-state confirmation risks error propagation, while step-by-step confirmation incurs excessive operational overhead.
Method: We propose a decision-theoretic model for optimizing intermediate confirmation points, formulating confirmation deployment as a minimum-time scheduling problem and incorporating a user error-correction behavioral pattern (CDCR) to guide policy generation. Our approach integrates formative research, theoretical modeling, and controlled user experiments.
Contribution/Results: The method achieves Pareto-optimal trade-offs between interruption frequency and rollback cost. Empirical evaluation demonstrates that 81% of users significantly prefer the proposed approach; average task completion time decreases by 13.54%; and both controllability and execution efficiency—along with overall user experience—are improved.
📝 Abstract
Existing AI agents typically execute multi-step tasks autonomously and only allow user confirmation at the end. During execution, users have little control, making the confirm-at-end approach brittle: a single error can cascade and force a complete restart. Confirming every step avoids such failures, but imposes tedious overhead. Balancing excessive interruptions against costly rollbacks remains an open challenge. We address this problem by modeling confirmation as a minimum time scheduling problem. We conducted a formative study with eight participants, which revealed a recurring Confirmation-Diagnosis-Correction-Redo (CDCR) pattern in how users monitor errors. Based on this pattern, we developed a decision-theoretic model to determine time-efficient confirmation point placement. We then evaluated our approach using a within-subjects study where 48 participants monitored AI agents and repaired their mistakes while executing tasks. Results show that 81 percent of participants preferred our intermediate confirmation approach over the confirm-at-end approach used by existing systems, and task completion time was reduced by 13.54 percent.