π€ AI Summary
Autonomous agents operating in partially observable environments face challenges in dynamically adapting to unreliable external advisors. Method: This paper proposes a Bayesian adaptive sequential decision-making framework that models advisor reliability as a latent variable integrated into the agentβs belief state; designs a learnable βqueryβ action; and jointly optimizes trust updating and query timing within a reinforcement learning and POMDP framework. Real-time advisor type inference enables online estimation and adaptation of advice quality. Contribution/Results: To our knowledge, this is the first work to unify dynamic credibility learning, active querying decisions, and sequential planning. Experiments demonstrate robust performance under abrupt or gradual changes in advisor reliability, rapid convergence, significantly reduced redundant queries, and improved human-agent collaborative decision-making efficiency.
π Abstract
Autonomous agents operating in sequential decision-making tasks under uncertainty can benefit from external action suggestions, which provide valuable guidance but inherently vary in reliability. Existing methods for incorporating such advice typically assume static and known suggester quality parameters, limiting practical deployment. We introduce a framework that dynamically learns and adapts to varying suggester reliability in partially observable environments. First, we integrate suggester quality directly into the agent's belief representation, enabling agents to infer and adjust their reliance on suggestions through Bayesian inference over suggester types. Second, we introduce an explicit ``ask'' action allowing agents to strategically request suggestions at critical moments, balancing informational gains against acquisition costs. Experimental evaluation demonstrates robust performance across varying suggester qualities, adaptation to changing reliability, and strategic management of suggestion requests. This work provides a foundation for adaptive human-agent collaboration by addressing suggestion uncertainty in uncertain environments.