Bridging Interpretability and Optimization: Provably Attribution-Weighted Actor-Critic in Reproducing Kernel Hilbert Spaces

📅 2025-12-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing Actor-Critic methods suffer from poor interpretability and fail to model the heterogeneous influence of individual state dimensions on the reward. To address this, we propose an attribution-aware two-timescale kernelized Actor-Critic algorithm. Our method is the first to integrate SHAP-based state attribution with reproducing kernel Hilbert space (RKHS) representations, incorporating a Mahalanobis-weighted reweighting mechanism. It employs vector-valued (Actor) and scalar-valued (Critic) RKHS modeling, kernel mean embeddings, and conditional mean embeddings to compute attributions, and leverages sparse dictionary learning for efficient optimization. Theoretically, we establish non-asymptotic convergence guarantees and perturbation error bounds. Empirically, our approach significantly improves policy stability, training efficiency, and attribution interpretability across three continuous control benchmarks.

Technology Category

Application Category

📝 Abstract
Actor-critic (AC) methods are a cornerstone of reinforcement learning (RL) but offer limited interpretability. Current explainable RL methods seldom use state attributions to assist training. Rather, they treat all state features equally, thereby neglecting the heterogeneous impacts of individual state dimensions on the reward. We propose RKHS--SHAP-based Advanced Actor--Critic (RSA2C), an attribution-aware, kernelized, two-timescale AC algorithm, including Actor, Value Critic, and Advantage Critic. The Actor is instantiated in a vector-valued reproducing kernel Hilbert space (RKHS) with a Mahalanobis-weighted operator-valued kernel, while the Value Critic and Advantage Critic reside in scalar RKHSs. These RKHS-enhanced components use sparsified dictionaries: the Value Critic maintains its own dictionary, while the Actor and Advantage Critic share one. State attributions, computed from the Value Critic via RKHS--SHAP (kernel mean embedding for on-manifold expectations and conditional mean embedding for off-manifold expectations), are converted into Mahalanobis-gated weights that modulate Actor gradients and Advantage Critic targets. Theoretically, we derive a global, non-asymptotic convergence bound under state perturbations, showing stability through the perturbation-error term and efficiency through the convergence-error term. Empirical results on three standard continuous-control environments show that our algorithm achieves efficiency, stability, and interpretability.
Problem

Research questions and friction points this paper is trying to address.

Enhances interpretability of actor-critic reinforcement learning methods
Incorporates state attributions to guide training and improve efficiency
Ensures stability and convergence under state perturbations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Attribution-aware kernelized actor-critic algorithm
RKHS-SHAP state attributions modulate gradients and targets
Sparsified dictionaries in RKHS for efficiency and stability
🔎 Similar Papers
No similar papers found.