Sampling-Based Safe Reinforcement Learning

๐Ÿ“… 2026-05-19
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

223K/year
๐Ÿค– AI Summary
This work addresses the challenge of safe exploration in reinforcement learning by proposing a model-based safe reinforcement learning algorithm. The method approximates a worst-case optimization problem under uncertain dynamics by jointly imposing constraints over a finite set of dynamical samples and introduces a safety-aware exploration strategy based on epistemic uncertainty, eliminating the need for explicit exploration rewards. By integrating deep ensemble models to represent uncertainty, sampling-based model predictive control, and constrained optimization, the algorithm provides theoretical guarantees of high-probability safety throughout the entire learning process and establishes a finite-time sample complexity bound. Empirical results demonstrate that the approach achieves safe and efficient exploration on both simulated and real robotic platforms, successfully scales to high-dimensional continuous control tasks, and recovers near-optimal policies.
๐Ÿ“ Abstract
Safe exploration remains a fundamental challenge in reinforcement learning (RL), limiting the deployment of RL agents in the real world. We propose Sampling-Based Safe Reinforcement Learning (SBSRL), a model-based RL algorithm that maintains safety throughout the learning process by enforcing constraints jointly across a finite set of dynamics samples. This formulation approximates an intractable worst-case optimization over uncertain dynamics and enables practical safety guarantees in continuous domains. We further introduce an exploration strategy based on constraining epistemic uncertainty, eliminating the need for explicit exploration bonuses. Under regularity conditions, we derive high-probability guarantees of safety throughout learning and a finite-time sample complexity bound for recovering a near-optimal policy. Empirically, SBSRL achieves safe and efficient exploration both in simulation and in real robotic hardware, and readily extends to practical deep-ensemble implementations that scale to high-dimensional continuous control problems.
Problem

Research questions and friction points this paper is trying to address.

Safe exploration
Reinforcement learning
Safety guarantees
Uncertain dynamics
Continuous control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Safe Reinforcement Learning
Model-Based RL
Epistemic Uncertainty
Sample Complexity
Deep Ensembles
๐Ÿ”Ž Similar Papers