Tracing Distribution Shifts with Causal System Maps

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

Existing ML monitoring systems detect data distribution shifts but struggle to automatically identify their root causes—such as data defects, software failures, or genuine concept drift—relying instead on manual root-cause analysis. To address this, we propose a causal graph framework for ML systems, the first to integrate causal modeling with hierarchical system abstraction. It constructs a multi-granularity dependency graph spanning environmental variables, data pipelines, and model components. By analyzing propagation paths, the framework enables traceable attribution of distribution shifts back to their origins. This systematically establishes a mapping between observed distributional changes and their underlying causes, advancing ML monitoring from merely detecting *whether* a shift occurs to explaining *why* it occurs. The framework provides both theoretical foundations and a practical technical pathway toward interpretable, traceable, and automated ML monitoring. (149 words)

Technology Category

Application Category

📝 Abstract

Monitoring machine learning (ML) systems is hard, with standard practice focusing on detecting distribution shifts rather than their causes. Root-cause analysis often relies on manual tracing to determine whether a shift is caused by software faults, data-quality issues, or natural change. We propose ML System Maps -- causal maps that, through layered views, make explicit the propagation paths between the environment and the ML system's internals, enabling systematic attribution of distribution shifts. We outline the approach and a research agenda for its development and evaluation.

Problem

Research questions and friction points this paper is trying to address.

Detecting causes of ML system distribution shifts

Automating root-cause analysis for data quality issues

Mapping causal propagation paths in ML systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causal maps visualize ML system environment interactions

Layered views trace distribution shift propagation paths

Systematic attribution replaces manual root-cause analysis

🔎 Similar Papers

Learning the Distribution Map in Reverse Causal Performative Prediction

2024-05-24arXiv.orgCitations: 3

💼 Related Jobs

Sr Machine Learning Engineer

Disney

The hiring range for this position in Burbank, CA is $155,700 - $208,700 per year and in Seattle is $163,100 - $218,700 per year. The base pay actually offered will take into account internal equity and also may vary depending on the candidate’s geographic region, job-related knowledge, skills, and experience among other factors. A bonus and/or long-term incentive units may be provided as part of the compensation package, in addition to the full range of medical, financial, and/or other benefits, dependent on the level and position offered.

Lake Buena Vista, FL, USA / USA - CA - 820 S Flower St, USA - FL - Kirkman Point 1, USA - WA - 925 4th Ave

Machine Learning Engineer