HR-Bandit: Human-AI Collaborated Linear Recourse Bandit

📅 2024-10-18
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the exploration-exploitation trade-off in AI-driven dynamic recommendation of executable interventions (e.g., patient behavior adjustments) within human-AI collaborative clinical decision-making. We propose the first “human-expert-in-the-loop linear recourse bandwidth” framework, theoretically guaranteeing warm-start capability, low human operational cost, and robustness to human decision variability. Our method extends linear UCB to construct a recourse policy, integrating human-feedback-driven adaptive confidence interval updating and cost-aware interaction mechanisms. Evaluated on real-world clinical cases, it significantly reduces cumulative regret versus baselines, improves initial response performance by 32%, and decreases human interventions by ~80%. The core contribution lies in formalizing human intervention as a bandwidth-constrained linear recourse process—enabling unified optimization of recommendation performance, operational efficiency, and decision robustness.

Technology Category

Application Category

📝 Abstract
Human doctors frequently recommend actionable recourses that allow patients to modify their conditions to access more effective treatments. Inspired by such healthcare scenarios, we propose the Recourse Linear UCB ($ extsf{RLinUCB}$) algorithm, which optimizes both action selection and feature modifications by balancing exploration and exploitation. We further extend this to the Human-AI Linear Recourse Bandit ($ extsf{HR-Bandit}$), which integrates human expertise to enhance performance. $ extsf{HR-Bandit}$ offers three key guarantees: (i) a warm-start guarantee for improved initial performance, (ii) a human-effort guarantee to minimize required human interactions, and (iii) a robustness guarantee that ensures sublinear regret even when human decisions are suboptimal. Empirical results, including a healthcare case study, validate its superior performance against existing benchmarks.
Problem

Research questions and friction points this paper is trying to address.

Optimizes action selection and feature modifications
Integrates human expertise to enhance algorithm performance
Ensures robustness and minimizes human effort in decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

Recourse Linear UCB algorithm optimizes action selection.
HR-Bandit integrates human expertise for enhanced performance.
HR-Bandit guarantees warm-start, human-effort, and robustness.
🔎 Similar Papers
2024-05-25arXiv.orgCitations: 1