Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the impact of model pruning on policy robustness in adversarial Markov decision processes (AMDPs). Addressing the critical question—“Does pruning degrade certified robustness?”—we establish the first theoretical framework for pruning in adversarial reinforcement learning, proving that element-wise pruning strictly tightens certified robustness bounds. Methodologically, we integrate magnitude-based and micro-pruning under a three-term regret decomposition framework, leveraging Lipschitz-constrained Gaussian and categorical policies to jointly characterize clean-task performance, pruning-induced loss, and robustness gain. Experiments demonstrate that moderate sparsity not only substantially improves certified robustness but also preserves—or even enhances—original task performance. Our results reveal the performance–robustness trade-off frontier and establish pruning as more than a compression technique: it serves as a structured mechanism for robustness optimization.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) policies deployed in real-world environments must remain reliable under adversarial perturbations. At the same time, modern deep RL agents are heavily over-parameterized, raising costs and fragility concerns. While pruning has been shown to improve robustness in supervised learning, its role in adversarial RL remains poorly understood. We develop the first theoretical framework for certified robustness under pruning in state-adversarial Markov decision processes (SA-MDPs). For Gaussian and categorical policies with Lipschitz networks, we prove that element-wise pruning can only tighten certified robustness bounds; pruning never makes the policy less robust. Building on this, we derive a novel three-term regret decomposition that disentangles clean-task performance, pruning-induced performance loss, and robustness gains, exposing a fundamental performance--robustness frontier. Empirically, we evaluate magnitude and micro-pruning schedules on continuous-control benchmarks with strong policy-aware adversaries. Across tasks, pruning consistently uncovers reproducible ``sweet spots'' at moderate sparsity levels, where robustness improves substantially without harming - and sometimes even enhancing - clean performance. These results position pruning not merely as a compression tool but as a structural intervention for robust RL.
Problem

Research questions and friction points this paper is trying to address.

Establishes theoretical framework for certified robustness in adversarial RL pruning
Proves pruning tightens robustness bounds without compromising policy reliability
Identifies performance-robustness trade-offs through novel regret decomposition analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Element-wise pruning tightens certified robustness bounds
Three-term regret decomposition reveals performance-robustness frontier
Pruning discovers robustness sweet spots without performance loss
🔎 Similar Papers
No similar papers found.
J
James Pedley
Machine Learning Research Group, Department of Engineering Science, University of Oxford
B
Benjamin Etheridge
Machine Learning Research Group, Department of Engineering Science, University of Oxford
S
Stephen J. Roberts
Machine Learning Research Group, Department of Engineering Science, University of Oxford
Francesco Quinzan
Francesco Quinzan
University of Oxford