🤖 AI Summary
Existing approaches to concept drift in non-stationary data streams struggle to balance efficiency and specialization due to reliance on coarse-grained adaptation or simplistic ensembles.
Method: This paper proposes an online Mixture-of-Experts (MoE) architecture featuring a symbiotic learning loop between a compact neural router and an incrementally updated pool of Hoeffding Tree experts. A multi-hot correctness mask provides precise supervision to the router, enabling rapid expert specialization; a dynamic expert selection strategy further enhances response accuracy.
Results: Evaluated on nine benchmark data streams—including abrupt and gradual drifts as well as real-world scenarios—the method matches state-of-the-art adaptive ensemble methods in predictive performance while achieving significantly higher resource efficiency and faster adaptation speed.
📝 Abstract
Learning from non-stationary data streams subject to concept drift requires models that can adapt on-the-fly while remaining resource-efficient. Existing adaptive ensemble methods often rely on coarse-grained adaptation mechanisms or simple voting schemes that fail to optimally leverage specialized knowledge. This paper introduces DriftMoE, an online Mixture-of-Experts (MoE) architecture that addresses these limitations through a novel co-training framework. DriftMoE features a compact neural router that is co-trained alongside a pool of incremental Hoeffding tree experts. The key innovation lies in a symbiotic learning loop that enables expert specialization: the router selects the most suitable expert for prediction, the relevant experts update incrementally with the true label, and the router refines its parameters using a multi-hot correctness mask that reinforces every accurate expert. This feedback loop provides the router with a clear training signal while accelerating expert specialization. We evaluate DriftMoE's performance across nine state-of-the-art data stream learning benchmarks spanning abrupt, gradual, and real-world drifts testing two distinct configurations: one where experts specialize on data regimes (multi-class variant), and another where they focus on single-class specialization (task-based variant). Our results demonstrate that DriftMoE achieves competitive results with state-of-the-art stream learning adaptive ensembles, offering a principled and efficient approach to concept drift adaptation. All code, data pipelines, and reproducibility scripts are available in our public GitHub repository: https://github.com/miguel-ceadar/drift-moe.