Monitoring and Observability of Machine Learning Systems: Current Practices and Gaps

📅 2025-10-28

📈 Citations: 0

✨ Influential: 0

career value

227K/year

🤖 AI Summary

This study addresses the critical challenge of “silent failures”—erroneous model decisions without system crashes—in production machine learning systems, which undermine conventional monitoring and expose a gap in empirically grounded observability practices. Through seven cross-industry focus group interviews, we applied qualitative thematic coding and scenario mapping to systematically identify the types of observability data practitioners collect and their concrete uses in model validation, anomaly detection, and root-cause diagnosis. Our findings constitute the first empirical characterization of key blind spots in current ML observability tooling: delayed response to feature drift, lack of decision traceability, and difficulty quantifying business impact. Based on these insights, we propose three foundational design principles for next-generation observability tools—explanability-awareness, causal attribution support, and business-impact alignment—and establish an empirically anchored theoretical foundation for future evaluation frameworks and standardization efforts. (149 words)

Technology Category

Application Category

📝 Abstract

Production machine learning (ML) systems fail silently -- not with crashes, but through wrong decisions. While observability is recognized as critical for ML operations, there is a lack empirical evidence of what practitioners actually capture. This study presents empirical results on ML observability in practice through seven focus group sessions in several domains. We catalog the information practitioners systematically capture across ML systems and their environment and map how they use it to validate models, detect and diagnose faults, and explain observed degradations. Finally, we identify gaps in current practice and outline implications for tooling design and research to establish ML observability practices.

Problem

Research questions and friction points this paper is trying to address.

Investigating current practices in ML system monitoring and observability

Identifying gaps between theoretical importance and actual implementation

Cataloging information captured for model validation and fault diagnosis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cataloging systematically captured ML system information

Mapping usage for model validation and fault diagnosis

Identifying gaps to guide observability tooling design

🔎 Similar Papers

No similar papers found.