Constraint-Adaptive Policy Switching for Offline Safe Reinforcement Learning

📅 2024-12-25

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the challenge in offline safe reinforcement learning (OSRL) where safety constraints dynamically change during deployment, making real-time adaptation difficult. To this end, we propose CAPS—a novel framework that introduces the first state-level policy-switching mechanism for constraint adaptation, enabling arbitrary runtime cost-constraint responsiveness without retraining on a fixed offline dataset. Methodologically, CAPS learns a multi-objective offline policy ensemble with shared state representations and incorporates a state-dependent online policy selector. Evaluated across all 38 tasks in the DSRL benchmark, CAPS significantly outperforms existing methods. It establishes the first wrapper-style baseline in OSRL that simultaneously achieves strong robustness and plug-and-play deployability—paving the way for practical, adaptive deployment of offline safe policies.

Technology Category

Application Category

📝 Abstract

Offline safe reinforcement learning (OSRL) involves learning a decision-making policy to maximize rewards from a fixed batch of training data to satisfy pre-defined safety constraints. However, adapting to varying safety constraints during deployment without retraining remains an under-explored challenge. To address this challenge, we introduce constraint-adaptive policy switching (CAPS), a wrapper framework around existing offline RL algorithms. During training, CAPS uses offline data to learn multiple policies with a shared representation that optimize different reward and cost trade-offs. During testing, CAPS switches between those policies by selecting at each state the policy that maximizes future rewards among those that satisfy the current cost constraint. Our experiments on 38 tasks from the DSRL benchmark demonstrate that CAPS consistently outperforms existing methods, establishing a strong wrapper-based baseline for OSRL. The code is publicly available at https://github.com/yassineCh/CAPS.

Problem

Research questions and friction points this paper is trying to address.

Offline Safe Reinforcement Learning

Adaptation

New Safety Constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

CAPS

Adaptive Offline Safe Reinforcement Learning

Efficiency and Safety Trade-offs

🔎 Similar Papers

Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient Manipulation