Stress-Testing ML Pipelines with Adversarial Data Corruption

📅 2025-06-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world structured data often suffer from demographic missingness, biased labels, and systematic sampling bias—yet existing robustness evaluations rely on random or simplistic corruptions, failing to expose worst-case vulnerabilities of high-risk ML systems. Method: We propose SAVAGE, the first causality-driven, black-box interpretable stress-testing framework for structured data. It models data dependencies via causal graphs and implements corruption templates to enable causal representation of structured data contamination. Its novel bilevel optimization algorithm supports end-to-end, targeted vulnerability discovery—even for pipelines containing non-differentiable components. Results: Experiments show that just 5% contamination generated by SAVAGE induces catastrophic performance drops, significantly outperforming baselines. Moreover, SAVAGE reveals that core assumptions underlying mainstream data cleaning and fairness-aware learning methods systematically fail under realistic data defects.

Technology Category

Application Category

📝 Abstract
Structured data-quality issues, such as missing values correlated with demographics, culturally biased labels, or systemic selection biases, routinely degrade the reliability of machine-learning pipelines. Regulators now increasingly demand evidence that high-stakes systems can withstand these realistic, interdependent errors, yet current robustness evaluations typically use random or overly simplistic corruptions, leaving worst-case scenarios unexplored. We introduce SAVAGE, a causally inspired framework that (i) formally models realistic data-quality issues through dependency graphs and flexible corruption templates, and (ii) systematically discovers corruption patterns that maximally degrade a target performance metric. SAVAGE employs a bi-level optimization approach to efficiently identify vulnerable data subpopulations and fine-tune corruption severity, treating the full ML pipeline, including preprocessing and potentially non-differentiable models, as a black box. Extensive experiments across multiple datasets and ML tasks (data cleaning, fairness-aware learning, uncertainty quantification) demonstrate that even a small fraction (around 5 %) of structured corruptions identified by SAVAGE severely impacts model performance, far exceeding random or manually crafted errors, and invalidating core assumptions of existing techniques. Thus, SAVAGE provides a practical tool for rigorous pipeline stress-testing, a benchmark for evaluating robustness methods, and actionable guidance for designing more resilient data workflows.
Problem

Research questions and friction points this paper is trying to address.

Addressing structured data-quality issues in ML pipelines
Identifying worst-case corruption patterns affecting model performance
Providing a practical tool for rigorous pipeline stress-testing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Causally inspired framework models data-quality issues
Bi-level optimization identifies vulnerable data subpopulations
Systematically discovers worst-case corruption patterns
🔎 Similar Papers
No similar papers found.