Beyond Hard Constraints: Budget-Conditioned Reachability For Safe Offline Reinforcement Learning

📅 2026-03-08

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses the challenge of balancing safety and reward in reinforcement learning under cumulative cost constraints—a setting where existing methods, primarily designed for hard instantaneous constraints, often fall short. The authors propose a Budget-Conditioned Reachability approach that extends reachability analysis to cumulative cost scenarios for the first time. By offline precomputing safe state-action sets, the method decouples reward maximization from safety enforcement, thereby circumventing the instability associated with min-max or Lagrangian-based optimization. Evaluated on standard offline safe reinforcement learning benchmarks and a real-world maritime navigation task, the proposed approach matches or exceeds the performance of state-of-the-art methods while rigorously guaranteeing cumulative safety throughout deployment.

Technology Category

Application Category

📝 Abstract

Sequential decision making using Markov Decision Process underpins many realworld applications. Both model-based and model free methods have achieved strong results in these settings. However, real-world tasks must balance reward maximization with safety constraints, often conflicting objectives, that can lead to unstable min/max, adversarial optimization. A promising alternative is safety reachability analysis, which precomputes a forward-invariant safe state, action set, ensuring that an agent starting inside this set remains safe indefinitely. Yet, most reachability based methods address only hard safety constraints, and little work extends reachability to cumulative cost constraints. To address this, first, we define a safetyconditioned reachability set that decouples reward maximization from cumulative safety cost constraints. Second, we show how this set enforces safety constraints without unstable min/max or Lagrangian optimization, yielding a novel offline safe RL algorithm that learns a safe policy from a fixed dataset without environment interaction. Finally, experiments on standard offline safe RL benchmarks, and a real world maritime navigation task demonstrate that our method matches or outperforms state of the art baselines while maintaining safety.

Problem

Research questions and friction points this paper is trying to address.

safe reinforcement learning

offline RL

reachability analysis

cumulative cost constraints

budget-conditioned safety

Innovation

Methods, ideas, or system contributions that make the work stand out.

budget-conditioned reachability

offline reinforcement learning

cumulative cost constraints