🤖 AI Summary
This work addresses the challenge that machine learning systems often face concurrent threats to robustness, privacy, and fairness, while existing defenses typically target only a single risk and lack systematic evaluation under multi-defense co-deployment. To bridge this gap, the authors propose a modular framework that encapsulates 35 state-of-the-art defense techniques as containerized components, integrated within a unified, reproducible platform featuring an automated experimentation engine and a multidimensional evaluation suite. This platform enables flexible composition and joint assessment of defenses across the machine learning lifecycle. The study provides the first systematic analysis of reproducibility discrepancies and integration challenges among different defense families when deployed in combination, offering foundational support for building reliable machine learning systems that simultaneously satisfy multiple security and trustworthiness objectives.
📝 Abstract
Machine learning systems face diverse threats that undermine robustness, privacy, and fairness. Although many defenses have been proposed, each typically addresses a single risk in isolation. Real-world deployments, however, require these defenses to be composed to meet multiple guarantees simultaneously. The process of composing defenses is complex and not well understood, and its impact on performance and security remains unclear.
We present Landseer, a modular framework for integrating machine learning (ML) defenses into the ML lifecycle and systematically evaluating their composition. Landseer encapsulates defenses as containerized modules, allowing existing and new techniques to be plugged in with minimal effort. Its evaluation engine automates experiments across multiple metrics, supporting the study of defenses both individually and in combination. In a preliminary study, we identified 35 state-of-the-art machine learning defenses. After filtering for reproducibility, we analyzed their performance using Landseer's unified evaluation process.
Our findings reveal gaps in replicability across defense families and provide insights into the challenges and opportunities in integrating multiple defenses, establishing a foundation for improving the reliability of machine learning systems.