MLmisFinder: A Specification and Detection Approach of Machine Learning Service Misuses

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

This work addresses the prevalent misuse of machine learning (ML) cloud services in software systems, which often degrades system quality and maintainability yet lacks effective detection mechanisms. To tackle this issue, the authors propose MLmisFinder, the first approach to systematically define and automatically identify seven representative categories of ML service misuse. The method integrates meta-modeling with rule-driven static analysis and introduces specialized detection algorithms tailored to scenarios such as data drift monitoring and schema validation. Evaluated on 107 open-source projects, MLmisFinder achieves an average precision of 96.7% and recall of 97%, substantially outperforming baseline techniques. Furthermore, its successful application to 817 additional systems demonstrates the widespread nature of ML service misuse in real-world software.

Technology Category

Application Category

📝 Abstract

Machine Learning (ML) cloud services, offered by leading providers such as Amazon, Google, and Microsoft, enable the integration of ML components into software systems without building models from scratch. However, the rapid adoption of ML services, coupled with the growing complexity of business requirements, has led to widespread misuses, compromising the quality, maintainability, and evolution of ML service-based systems. Though prior research has studied patterns and antipatterns in service-based and ML-based systems separately, automatic detection of ML service misuses remains a challenge. In this paper, we propose MLmisFinder, an automatic approach to detect ML service misuses in software systems, aiming to identify instances of improper use of ML services to help developers properly integrate ML components in ML service-based systems. We propose a metamodel that captures the data needed to detect misuses in ML service-based systems and apply a set of rule-based detection algorithms for seven misuse types. We evaluated MLmisFinder on 107 software systems collected from open-source GitHub repositories and compared it with a state-of-the-art baseline. Our results show that MLmisFinder effectively detects ML service misuses, achieving an average precision of 96.7\% and recall of 97\%, outperforming the state-of-the-art baseline. MLmisFinder also scaled efficiently to detect misuses across 817 ML service-based systems and revealed that such misuses are widespread, especially in areas such as data drift monitoring and schema validation.

Problem

Research questions and friction points this paper is trying to address.

Machine Learning service misuse

misuse detection

ML-based systems

cloud ML services

software quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

ML service misuse

metamodel

rule-based detection