CapyMOA: Efficient Machine Learning for Data Streams in Python

📅 2025-02-11

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

To address the challenges of low real-time online learning and evaluation efficiency in data stream scenarios, and the difficulty of integrating traditional online algorithms with deep learning, this paper proposes the first native Python open-source framework that deeply unifies MOA (Java) and PyTorch. Our method employs a lightweight JVM bridge for cross-language incremental model updates, introduces a sliding-window-based dynamic evaluation mechanism, and adopts a modular component architecture. The framework supports flexible data representations, extensible model integration, and end-to-end reproducible streaming experiments. Empirical evaluation on multiple standard stream datasets demonstrates that our approach significantly reduces inference latency (average reduction of 37%), improves classification accuracy (+2.1–4.8%), and enhances robustness to concept drift compared to baseline methods. This work provides a unified, efficient, and accessible infrastructure for dynamic learning across domains.

Technology Category

Application Category

📝 Abstract

CapyMOA is an open-source library designed for efficient machine learning on streaming data. It provides a structured framework for real-time learning and evaluation, featuring a flexible data representation. CapyMOA includes an extensible architecture that allows integration with external frameworks such as MOA and PyTorch, facilitating hybrid learning approaches that combine traditional online algorithms with deep learning techniques. By emphasizing adaptability, scalability, and usability, CapyMOA allows researchers and practitioners to tackle dynamic learning challenges across various domains.

Problem

Research questions and friction points this paper is trying to address.

Efficient machine learning on streaming data

Real-time learning and evaluation framework

Integration of traditional and deep learning techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source library for streaming data

Extensible architecture for hybrid learning

Real-time learning and evaluation framework

🔎 Similar Papers

No similar papers found.