🤖 AI Summary
To address the lack of safety guarantees during training of reinforcement learning agents in unknown black-box environments, this paper proposes ADVICE: a model-free, online, and interpretable adaptive action masking framework. Its core innovation lies in employing a contrastive autoencoder to achieve unsupervised disentanglement of safety-critical features, coupled with runtime action masking and online estimation of safety boundaries—enabling real-time identification and suppression of hazardous state-action pairs without prior domain knowledge. ADVICE maintains competitive task performance while substantially improving training safety: experiments demonstrate approximately a 50% reduction in safety violations, with task reward matching that of current state-of-the-art methods. Crucially, ADVICE is the first approach to jointly optimize safety supervision and policy learning in black-box settings, establishing a reliable safety-aware training paradigm for real-world deployment.
📝 Abstract
Empowering safe exploration of reinforcement learning (RL) agents during training is a critical challenge towards their deployment in many real-world scenarios. When prior knowledge of the domain or task is unavailable, training RL agents in unknown, extit{black-box} environments presents an even greater safety risk. We introduce mbox{ADVICE} (Adaptive Shielding with a Contrastive Autoencoder), a novel post-shielding technique that distinguishes safe and unsafe features of state-action pairs during training, and uses this knowledge to protect the RL agent from executing actions that yield likely hazardous outcomes. Our comprehensive experimental evaluation against state-of-the-art safe RL exploration techniques shows that ADVICE significantly reduces safety violations ($approx!!50%$) during training, with a competitive outcome reward compared to other techniques.