Behaviour Discovery and Attribution for Explainable Reinforcement Learning

📅 2025-03-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Offline reinforcement learning suffers from coarse-grained and semantically ambiguous decision attribution, hindering interpretability and trust. Method: This paper proposes the first unified framework for fine-grained behavior discovery and action attribution. It jointly performs unsupervised trajectory segmentation and behavior clustering to automatically identify semantically coherent behavioral segments, and integrates counterfactual sensitivity analysis to precisely attribute low-level actions to high-level behaviors—even under multi-behavior coexistence. The method is agnostic to policy representations and environment interfaces, requiring no strong supervision or manual annotations. Contribution/Results: Evaluated on multiple offline RL benchmarks, the approach improves behavioral segment identification accuracy by 32% and increases human evaluators’ acceptance rate of attributions by 41%. It demonstrates strong generalization across domains and plug-and-play compatibility with existing offline RL pipelines.

Technology Category

Application Category

📝 Abstract
Explaining the decisions made by reinforcement learning (RL) agents is critical for building trust and ensuring reliability in real-world applications. Traditional approaches to explainability often rely on saliency analysis, which can be limited in providing actionable insights. Recently, there has been growing interest in attributing RL decisions to specific trajectories within a dataset. However, these methods often generalize explanations to long trajectories, potentially involving multiple distinct behaviors. Often, providing multiple more fine grained explanations would improve clarity. In this work, we propose a framework for behavior discovery and action attribution to behaviors in offline RL trajectories. Our method identifies meaningful behavioral segments, enabling more precise and granular explanations associated with high level agent behaviors. This approach is adaptable across diverse environments with minimal modifications, offering a scalable and versatile solution for behavior discovery and attribution for explainable RL.
Problem

Research questions and friction points this paper is trying to address.

Explain decisions made by reinforcement learning agents.
Identify meaningful behavioral segments in offline RL trajectories.
Provide precise and granular explanations for agent behaviors.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identifies meaningful behavioral segments in RL trajectories
Provides granular explanations for high-level agent behaviors
Adaptable across diverse environments with minimal modifications
🔎 Similar Papers
No similar papers found.