Stress-Testing ML Pipelines with Adversarial Data Corruption

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Real-world structured data often suffer from demographic missingness, biased labels, and systematic sampling bias—yet existing robustness evaluations rely on random or simplistic corruptions, failing to expose worst-case vulnerabilities of high-risk ML systems. Method: We propose SAVAGE, the first causality-driven, black-box interpretable stress-testing framework for structured data. It models data dependencies via causal graphs and implements corruption templates to enable causal representation of structured data contamination. Its novel bilevel optimization algorithm supports end-to-end, targeted vulnerability discovery—even for pipelines containing non-differentiable components. Results: Experiments show that just 5% contamination generated by SAVAGE induces catastrophic performance drops, significantly outperforming baselines. Moreover, SAVAGE reveals that core assumptions underlying mainstream data cleaning and fairness-aware learning methods systematically fail under realistic data defects.

Technology Category

Application Category

📝 Abstract

Structured data-quality issues, such as missing values correlated with demographics, culturally biased labels, or systemic selection biases, routinely degrade the reliability of machine-learning pipelines. Regulators now increasingly demand evidence that high-stakes systems can withstand these realistic, interdependent errors, yet current robustness evaluations typically use random or overly simplistic corruptions, leaving worst-case scenarios unexplored. We introduce SAVAGE, a causally inspired framework that (i) formally models realistic data-quality issues through dependency graphs and flexible corruption templates, and (ii) systematically discovers corruption patterns that maximally degrade a target performance metric. SAVAGE employs a bi-level optimization approach to efficiently identify vulnerable data subpopulations and fine-tune corruption severity, treating the full ML pipeline, including preprocessing and potentially non-differentiable models, as a black box. Extensive experiments across multiple datasets and ML tasks (data cleaning, fairness-aware learning, uncertainty quantification) demonstrate that even a small fraction (around 5 %) of structured corruptions identified by SAVAGE severely impacts model performance, far exceeding random or manually crafted errors, and invalidating core assumptions of existing techniques. Thus, SAVAGE provides a practical tool for rigorous pipeline stress-testing, a benchmark for evaluating robustness methods, and actionable guidance for designing more resilient data workflows.

Problem

Research questions and friction points this paper is trying to address.

Addressing structured data-quality issues in ML pipelines

Identifying worst-case corruption patterns affecting model performance

Providing a practical tool for rigorous pipeline stress-testing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Causally inspired framework models data-quality issues

Bi-level optimization identifies vulnerable data subpopulations

Systematically discovers worst-case corruption patterns

🔎 Similar Papers

Adversarial Attacks and Defenses in Multivariate Time-Series Forecasting for Smart and Connected Infrastructures