🤖 AI Summary
This paper addresses the high time and space overheads incurred by data materialization (i.e., movement or replication) during machine learning (ML) training over heterogeneous data sources. To this end, it proposes the first GPU-accelerated automated factorized learning framework. Departing from conventional relational joins and manual algorithm adaptation, the framework unifies heterogeneous data representations via matrix metadata modeling, and integrates linear-algebraic rewriting, ML-driven dynamic materialization/factorization decisions, and GPU-fused operator execution for CPU/GPU co-optimization. Key contributions include: (1) the first extension of factorized learning to a GPU-hardware-friendly paradigm; (2) a joint data-model cost estimator that guides execution strategy selection; and (3) end-to-end automated optimization. Experiments demonstrate up to 8.9× speedup per GPU operator and over 20% end-to-end acceleration for batched ML training, significantly improving training efficiency and practicality across heterogeneous data and multi-hardware environments.
📝 Abstract
The machine learning (ML) training over disparate data sources traditionally involves materialization, which can impose substantial time and space overhead due to data movement and replication. Factorized learning, which leverages direct computation on disparate sources through linear algebra (LA) rewriting, has emerged as a viable alternative to improve computational efficiency. However, the adaptation of factorized learning to leverage the full capabilities of modern LA-friendly hardware like GPUs has been limited, often requiring manual intervention for algorithm compatibility. This paper introduces Ilargi, a novel factorized learning framework that utilizes matrix-represented data integration (DI) metadata to facilitate automatic factorization across CPU and GPU environments without the need for costly relational joins. Ilargi incorporates an ML-based cost estimator to intelligently selects between factorization and materialization based on data properties, algorithm complexity, hardware environments, and their interactions. This strategy ensures up to 8.9x speedups on GPUs and achieves over 20% acceleration in batch ML training workloads, thereby enhancing the practicability of ML training across diverse data integration scenarios and hardware platforms. To our knowledge, this work is the very first effort in GPU-compatible factorized learning.