🤖 AI Summary
Machine learning systems face robustness challenges including silent failures, out-of-distribution (OOD) inputs, and adversarial attacks. To address these, this paper proposes ML-On-Rails—a standardized, engineering-oriented protocol. It introduces an HTTP status code–based model-software communication mechanism that enables structured, interpretable reporting of error types, prediction confidence, and decision rationales. The protocol integrates OOD detection, adversarial example identification, input validation, and local interpretability techniques, with design and validation rigorously informed by in-depth industrial practitioner surveys. Evaluation demonstrates that ML-On-Rails significantly improves fault observability and response consistency in ML systems, bridging a critical standardization gap in ML robustness assurance. It provides a practical, deployable engineering paradigm for building trustworthy ML systems.
📝 Abstract
Ensuring robustness in ML-enabled software systems requires addressing critical challenges, such as silent failures, out-of-distribution (OOD) data, and adversarial attacks. Traditional software engineering practices, which rely on predefined logic, are insufficient for ML components that depend on data and probabilistic decision-making. To address these challenges, we propose the ML-On-Rails protocol, a unified framework designed to enhance the robustness and trustworthiness of ML-enabled systems in production. This protocol integrates key safeguards such as OOD detection, adversarial attack detection, input validation, and explainability. It also includes a model-to-software communication framework using HTTP status codes to enhance transparency in reporting model outcomes and errors. To align our approach with real-world challenges, we conducted a practitioner survey, which revealed major robustness issues, gaps in current solutions, and highlighted how a standardised protocol such as ML-On-Rails can improve system robustness. Our findings highlight the need for more support and resources for engineers working with ML systems. Finally, we outline future directions for refining the proposed protocol, leveraging insights from the survey and real-world applications to continually enhance its effectiveness.