Online Mixture of Experts: No-Regret Learning for Optimal Collective Decision-Making

📅 2025-10-19

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

This paper investigates the optimal accuracy of collective decision-making by multiple context-aware experts. We propose an expert-guided bandit learning framework and design two novel online aggregation algorithms: (1) UCB-driven pruned majority voting and (2) online majority voting with dynamically weighted experts based on predictive capability. To our knowledge, these are the first aggregation methods for multi-expert ensembles providing rigorous no-regret theoretical guarantees. Our approach jointly incorporates contextual modeling, sequential expert elimination, and dynamic weight adaptation, and we prove a sublinear regret bound under the bandit setting. Empirical evaluation on online fine-tuning of large language models demonstrates that our methods significantly improve response accuracy over baseline approaches, validating both theoretical soundness and practical efficacy.

Technology Category

Application Category

📝 Abstract

We explore the use of expert-guided bandit learning, which we refer to as online mixture-of-experts (OMoE). In this setting, given a context, a candidate committee of experts must determine how to aggregate their outputs to achieve optimal results in terms of aggregate accuracy. We propose two algorithms to address this problem. The first algorithm combines aggregate voting with UCB-driven successive elimination, efficiently pruning suboptimal exploration actions. The second algorithm employs an online weighted-majority-voting mechanism, leveraging the respective voting power of each expert proportional to their predictive power. We derive theoretical guarantees for the regret properties in the bandit setting under ideal circumstances, and empirical results are provided accordingly. As a modern study on applications, these methods are applied to the online fine-tuning of a set of expert large language models (LLMs), where after each response, the generative LLM dynamically reweighs its set of experts and/or selects the optimal committee of experts to generate the most accurate response. Our results introduce new methodologies and no-regret guarantees for combining multiple experts to improve on the performance of the an aggregate model overall.

Problem

Research questions and friction points this paper is trying to address.

Optimizing expert committee aggregation for collective decision-making accuracy

Developing bandit algorithms for dynamic expert selection and weighting

Enhancing large language model performance through online expert combination

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines aggregate voting with UCB elimination

Uses weighted majority voting by expert power

Dynamically reweighs expert LLMs for accuracy

🔎 Similar Papers

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning