🤖 AI Summary
This study addresses the inherent tension between scalability and maintainability in machine learning (ML) systems—a critical challenge impeding robust, production-grade deployment. Method: We conduct a systematic literature review (SLR) grounded in 124 high-quality publications, developing the first six-dimensional analytical framework spanning data engineering, model engineering, and system deployment. Contribution/Results: The work identifies 41 categories of maintainability issues and 13 categories of scalability issues, uncovering their stage-crossing trade-offs and synergies. It introduces the first taxonomy of scalability–maintainability challenges in ML systems, accompanied by a problem distribution map and an evidence-based repository quantifying solution effectiveness. Collectively, these findings deliver empirically grounded, cross-stage design principles and actionable optimization pathways for industrial ML system development.
📝 Abstract
This systematic literature review examines the critical challenges and solutions related to scalability and maintainability in Machine Learning (ML) systems. As ML applications become increasingly complex and widespread across industries, the need to balance system scalability with long-term maintainability has emerged as a significant concern. This review synthesizes current research and practices addressing these dual challenges across the entire ML life-cycle, from data engineering to model deployment in production. We analyzed 124 papers to identify and categorize 41 maintainability challenges and 13 scalability challenges, along with their corresponding solutions. Our findings reveal intricate inter dependencies between scalability and maintainability, where improvements in one often impact the other. The review is structured around six primary research questions, examining maintainability and scalability challenges in data engineering, model engineering, and ML system development. We explore how these challenges manifest differently across various stages of the ML life-cycle. This comprehensive overview offers valuable insights for both researchers and practitioners in the field of ML systems. It aims to guide future research directions, inform best practices, and contribute to the development of more robust, efficient, and sustainable ML applications across various domains.