🤖 AI Summary
This work addresses a critical gap in the testing of machine learning (ML) components, where existing approaches predominantly focus on model performance while neglecting system-level quality attributes such as throughput, resource consumption, and robustness—often leading to integration failures. To bridge this gap, the paper proposes the first standalone quality model specifically tailored for ML components. Grounded in the ISO/IEC 25010 quality standard framework and informed by requirements engineering and software quality modeling techniques, the model systematically decouples and structures key quality attributes of ML components, thereby addressing the lack of component-level applicability in ISO/IEC 25059. It provides developers and stakeholders with a unified terminology to prioritize testing efforts. The model’s effectiveness has been validated through user studies and has been integrated into an open-source ML testing tool, enabling practical deployment.
📝 Abstract
Despite increased adoption and advances in machine learning (ML), there are studies showing that many ML prototypes do not reach the production stage and that testing is still largely limited to testing model properties, such as model performance, without considering requirements derived from the system it will be a part of, such as throughput, resource consumption, or robustness. This limited view of testing leads to failures in model integration, deployment, and operations. In traditional software development, quality models such as ISO 25010 provide a widely used structured framework to assess software quality, define quality requirements, and provide a common language for communication with stakeholders. A newer standard, ISO 25059, defines a more specific quality model for AI systems. However, a problem with this standard is that it combines system attributes with ML component attributes, which is not helpful for a model developer, as many system attributes cannot be assessed at the component level. In this paper, we present a quality model for ML components that serves as a guide for requirements elicitation and negotiation and provides a common vocabulary for ML component developers and system stakeholders to agree on and define system-derived requirements and focus their testing efforts accordingly. The quality model was validated through a survey in which the participants agreed with its relevance and value. The quality model has been successfully integrated into an open-source tool for ML component testing and evaluation demonstrating its practical application.