🤖 AI Summary
Existing reinforcement learning (RL) toolkits generally lack native support for hard safety constraints and interpretable decision-making mechanisms. To address this, we propose the first lightweight, open-source library that deeply integrates SHAP values and Grad-CAM saliency maps into the constrained RL training pipeline—specifically within the DQN framework—to jointly optimize safety and interpretability. Our approach employs custom constraint-aware reward shaping and Gym environment wrappers, enabling zero-modification integration with existing RL codebases. It supports real-time attribution of decisions and quantitative violation analysis. Evaluated on a safety-constrained CartPole variant, our method reduces safety violation rates to <0.3%, generates visually verifiable attribution maps, achieves sub-8ms per-step inference latency, and enables one-command deployment via pip. This work bridges critical gaps in trustworthy, production-ready constrained RL.
📝 Abstract
We introduce SafeRL-Lite, an open-source Python library for building reinforcement learning (RL) agents that are both constrained and explainable. Existing RL toolkits often lack native mechanisms for enforcing hard safety constraints or producing human-interpretable rationales for decisions. SafeRL-Lite provides modular wrappers around standard Gym environments and deep Q-learning agents to enable: (i) safety-aware training via constraint enforcement, and (ii) real-time post-hoc explanation via SHAP values and saliency maps. The library is lightweight, extensible, and installable via pip, and includes built-in metrics for constraint violations. We demonstrate its effectiveness on constrained variants of CartPole and provide visualizations that reveal both policy logic and safety adherence. The full codebase is available at: https://github.com/satyamcser/saferl-lite.