CapyMOA: Efficient Machine Learning for Data Streams in Python

πŸ“… 2025-02-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the challenges of low real-time online learning and evaluation efficiency in data stream scenarios, and the difficulty of integrating traditional online algorithms with deep learning, this paper proposes the first native Python open-source framework that deeply unifies MOA (Java) and PyTorch. Our method employs a lightweight JVM bridge for cross-language incremental model updates, introduces a sliding-window-based dynamic evaluation mechanism, and adopts a modular component architecture. The framework supports flexible data representations, extensible model integration, and end-to-end reproducible streaming experiments. Empirical evaluation on multiple standard stream datasets demonstrates that our approach significantly reduces inference latency (average reduction of 37%), improves classification accuracy (+2.1–4.8%), and enhances robustness to concept drift compared to baseline methods. This work provides a unified, efficient, and accessible infrastructure for dynamic learning across domains.

Technology Category

Application Category

πŸ“ Abstract
CapyMOA is an open-source library designed for efficient machine learning on streaming data. It provides a structured framework for real-time learning and evaluation, featuring a flexible data representation. CapyMOA includes an extensible architecture that allows integration with external frameworks such as MOA and PyTorch, facilitating hybrid learning approaches that combine traditional online algorithms with deep learning techniques. By emphasizing adaptability, scalability, and usability, CapyMOA allows researchers and practitioners to tackle dynamic learning challenges across various domains.
Problem

Research questions and friction points this paper is trying to address.

Efficient machine learning on streaming data
Real-time learning and evaluation framework
Integration of traditional and deep learning techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source library for streaming data
Extensible architecture for hybrid learning
Real-time learning and evaluation framework
πŸ”Ž Similar Papers
No similar papers found.
H
H. Gomes
Victoria University of Wellington, New Zealand
A
Anton Lee
Victoria University of Wellington, New Zealand
N
N. Gunasekara
Halmstad University, Sweden
Yibin Sun
Yibin Sun
AI Institute, University of Waikato, New Zealand
G
G. Cassales
AI Institute, University of Waikato, New Zealand
J
Justin Liu
AI Institute, University of Waikato, New Zealand
M
Marco Heyden
Karlsruhe Institute of Technology, Germany
Vitor Cerqueira
Vitor Cerqueira
University of Porto, Faculty of Engineering
Machine learningTime series
M
M. Bahri
Sorbonne UniversitΓ©, France
Yun Sing Koh
Yun Sing Koh
The University of Auckland
Data Mining and Machine LearningData Stream MiningContinual Learning
Bernhard Pfahringer
Bernhard Pfahringer
Professor of Computer Science, University of Waikato
Machine LearningData Mining
A
A. Bifet
AI Institute, University of Waikato, New Zealand