Ensuring Robustness in ML-enabled Software Systems: A User Survey

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Machine learning systems face robustness challenges including silent failures, out-of-distribution (OOD) inputs, and adversarial attacks. To address these, this paper proposes ML-On-Rails—a standardized, engineering-oriented protocol. It introduces an HTTP status code–based model-software communication mechanism that enables structured, interpretable reporting of error types, prediction confidence, and decision rationales. The protocol integrates OOD detection, adversarial example identification, input validation, and local interpretability techniques, with design and validation rigorously informed by in-depth industrial practitioner surveys. Evaluation demonstrates that ML-On-Rails significantly improves fault observability and response consistency in ML systems, bridging a critical standardization gap in ML robustness assurance. It provides a practical, deployable engineering paradigm for building trustworthy ML systems.

Technology Category

Application Category

📝 Abstract

Ensuring robustness in ML-enabled software systems requires addressing critical challenges, such as silent failures, out-of-distribution (OOD) data, and adversarial attacks. Traditional software engineering practices, which rely on predefined logic, are insufficient for ML components that depend on data and probabilistic decision-making. To address these challenges, we propose the ML-On-Rails protocol, a unified framework designed to enhance the robustness and trustworthiness of ML-enabled systems in production. This protocol integrates key safeguards such as OOD detection, adversarial attack detection, input validation, and explainability. It also includes a model-to-software communication framework using HTTP status codes to enhance transparency in reporting model outcomes and errors. To align our approach with real-world challenges, we conducted a practitioner survey, which revealed major robustness issues, gaps in current solutions, and highlighted how a standardised protocol such as ML-On-Rails can improve system robustness. Our findings highlight the need for more support and resources for engineers working with ML systems. Finally, we outline future directions for refining the proposed protocol, leveraging insights from the survey and real-world applications to continually enhance its effectiveness.

Problem

Research questions and friction points this paper is trying to address.

Addressing silent failures and adversarial attacks in ML systems

Enhancing robustness against out-of-distribution data in ML software

Improving transparency through standardized error reporting protocols

Innovation

Methods, ideas, or system contributions that make the work stand out.

ML-On-Rails protocol enhances ML system robustness

Integrates OOD detection, adversarial defense, and explainability

Uses HTTP status codes for transparent error reporting

🔎 Similar Papers

A Systematic Literature Review on Explainability for Machine/Deep Learning-based Software Engineering Research