Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models

📅 2024-06-05
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the dynamic weighted fusion of multiple pre-trained large language models (LLMs) for online time-series forecasting. We propose an online Bayesian gating framework grounded in continuous-time hidden Markov models (CT-HMMs), the first to formulate expert selection as a CT-HMM and employ the Wonham–Shiryaev filter for efficient online state inference. We further design MoE-F, a plug-and-play mixture-of-experts algorithm that achieves theoretical optimality—minimizing regret bounds—while ensuring robust aggregation. The method integrates stochastic filtering, a parallel filter architecture, and closed-form optimal aggregation. Empirical evaluation on financial movement prediction shows a 17-percentage-point absolute improvement in F1 score over the best single model (a relative gain of +48.5%); moreover, our approach significantly outperforms specialized time-series models on long-horizon forecasting tasks.

Technology Category

Application Category

📝 Abstract
We propose MoE-F - a formalized mechanism for combining $N$ pre-trained Large Language Models (LLMs) for online time-series prediction by adaptively forecasting the best weighting of LLM predictions at every time step. Our mechanism leverages the conditional information in each expert's running performance to forecast the best combination of LLMs for predicting the time series in its next step. Diverging from static (learned) Mixture of Experts (MoE) methods, our approach employs time-adaptive stochastic filtering techniques to combine experts. By framing the expert selection problem as a finite state-space, continuous-time Hidden Markov model (HMM), we can leverage the Wohman-Shiryaev filter. Our approach first constructs N parallel filters corresponding to each of the $N$ individual LLMs. Each filter proposes its best combination of LLMs, given the information that they have access to. Subsequently, the N filter outputs are optimally aggregated to maximize their robust predictive power, and this update is computed efficiently via a closed-form expression, generating our ensemble predictor. Our contributions are: **(I)** the MoE-F plug-and-play filtering harness algorithm, **(II)** theoretical optimality guarantees of the proposed filtering-based gating algorithm (via optimality guarantees for its parallel Bayesian filtering and its robust aggregation steps), and **(III)** empirical evaluation and ablative results using state-of-the-art foundational and MoE LLMs on a real-world __Financial Market Movement__ task where MoE-F attains a remarkable 17% absolute and 48.5% relative F1 measure improvement over the next best performing individual LLM expert predicting short-horizon market movement based on streaming news. Further, we provide empirical evidence of substantial performance gains in applying MoE-F over specialized models in the long-horizon time-series forecasting domain.
Problem

Research questions and friction points this paper is trying to address.

Online time-series prediction using LLMs
Adaptive weighting of LLM predictions
Stochastic filtering for expert combination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Stochastic filtering-based online gating
Hidden Markov model for expert selection
Parallel Bayesian filtering and aggregation