🤖 AI Summary
This paper investigates the optimal accuracy of collective decision-making by multiple context-aware experts. We propose an expert-guided bandit learning framework and design two novel online aggregation algorithms: (1) UCB-driven pruned majority voting and (2) online majority voting with dynamically weighted experts based on predictive capability. To our knowledge, these are the first aggregation methods for multi-expert ensembles providing rigorous no-regret theoretical guarantees. Our approach jointly incorporates contextual modeling, sequential expert elimination, and dynamic weight adaptation, and we prove a sublinear regret bound under the bandit setting. Empirical evaluation on online fine-tuning of large language models demonstrates that our methods significantly improve response accuracy over baseline approaches, validating both theoretical soundness and practical efficacy.
📝 Abstract
We explore the use of expert-guided bandit learning, which we refer to as online mixture-of-experts (OMoE). In this setting, given a context, a candidate committee of experts must determine how to aggregate their outputs to achieve optimal results in terms of aggregate accuracy. We propose two algorithms to address this problem. The first algorithm combines aggregate voting with UCB-driven successive elimination, efficiently pruning suboptimal exploration actions. The second algorithm employs an online weighted-majority-voting mechanism, leveraging the respective voting power of each expert proportional to their predictive power. We derive theoretical guarantees for the regret properties in the bandit setting under ideal circumstances, and empirical results are provided accordingly. As a modern study on applications, these methods are applied to the online fine-tuning of a set of expert large language models (LLMs), where after each response, the generative LLM dynamically reweighs its set of experts and/or selects the optimal committee of experts to generate the most accurate response. Our results introduce new methodologies and no-regret guarantees for combining multiple experts to improve on the performance of the an aggregate model overall.