Bayesian Decision Making around Experts

📅 2025-10-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates optimal integration of structurally heterogeneous expert data in Bayesian multi-armed bandits: (i) offline pretraining—leveraging historical data from expert-optimal policies—and (ii) online collaborative learning—dynamically selecting actions based on either agent’s own experience or real-time expert feedback to update beliefs. Methodologically, it introduces an information-theoretic framework modeling how expert data shapes posterior beliefs, and proposes, for the first time, a mutual-information-driven mechanism for dynamic data-source selection. It further derives an information-aware regret bound. Theoretically, offline pretraining is proven to strictly tighten the regret upper bound. Empirically, the approach adaptively assesses expert reliability and achieves significant improvements in both sample efficiency and robustness over baseline methods.

Technology Category

Application Category

📝 Abstract
Complex learning agents are increasingly deployed alongside existing experts, such as human operators or previously trained agents. However, it remains unclear how should learners optimally incorporate certain forms of expert data, which may differ in structure from the learner's own action-outcome experiences. We study this problem in the context of Bayesian multi-armed bandits, considering: (i) offline settings, where the learner receives a dataset of outcomes from the expert's optimal policy before interaction, and (ii) simultaneous settings, where the learner must choose at each step whether to update its beliefs based on its own experience, or based on the outcome simultaneously achieved by an expert. We formalize how expert data influences the learner's posterior, and prove that pretraining on expert outcomes tightens information-theoretic regret bounds by the mutual information between the expert data and the optimal action. For the simultaneous setting, we propose an information-directed rule where the learner processes the data source that maximizes their one-step information gain about the optimal action. Finally, we propose strategies for how the learner can infer when to trust the expert and when not to, safeguarding the learner for the cases where the expert is ineffective or compromised. By quantifying the value of expert data, our framework provides practical, information-theoretic algorithms for agents to intelligently decide when to learn from others.
Problem

Research questions and friction points this paper is trying to address.

Optimally incorporating expert data that differs from learner's experiences
Designing algorithms for sequential decision-making with expert guidance
Quantifying when to trust expert data versus own experiences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian multi-armed bandits for expert data integration
Information-directed rule maximizes one-step information gain
Strategies infer expert trustworthiness to safeguard learning
🔎 Similar Papers
No similar papers found.