A Snapshot of Influence: A Local Data Attribution Framework for Online Reinforcement Learning

📅 2025-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In online reinforcement learning (RL), training samples dynamically influence subsequent data collection, rendering conventional data attribution methods ineffective and hindering both interpretability and sample efficiency. To address this, we establish the first theoretical framework for data attribution in online RL and propose the first local attribution method tailored to Proximal Policy Optimization (PPO). Our approach quantifies the immediate impact of individual trajectories in the replay buffer on the current action selection and cumulative return via gradient similarity. We further introduce Iterative Influence Filtering (IIF), an attribution-driven algorithm for dynamic experience selection. Evaluated across classic control, navigation, biomimetic locomotion, and RLHF benchmarks, our method substantially reduces sample complexity, accelerates convergence, improves final policy performance, and enhances both policy interpretability and training stability.

Technology Category

Application Category

📝 Abstract
Online reinforcement learning (RL) excels in complex, safety-critical domains, yet it faces challenges such as sample inefficiency, training instability, and a lack of interpretability. Data attribution offers a principled way to trace model behavior back to individual training samples. However, in online RL, each training sample not only drives policy updates but also influences future data collection, violating the fixed dataset assumption in existing attribution methods. In this paper, we initiate the study of data attribution for online RL, focusing on the widely used Proximal Policy Optimization (PPO) algorithm. We start by establishing a local attribution framework, interpreting model checkpoints with respect to the records in the recent training buffer. We design two target functions, capturing agent action and cumulative return respectively, and measure each record's contribution through gradient similarity between its training loss and these targets. We demonstrate the power of this framework through three concrete applications: diagnosis of learning, temporal analysis of behavior formation, and targeted intervention during training. Leveraging this framework, we further propose an algorithm, iterative influence-based filtering (IIF), for online RL training that iteratively performs experience filtering to refine policy updates. Across standard RL benchmarks (classic control, navigation, locomotion) to RLHF for large language models, IIF reduces sample complexity, speeds up training, and achieves higher returns. Overall, these results advance interpretability, efficiency, and effectiveness of online RL.
Problem

Research questions and friction points this paper is trying to address.

Addresses sample inefficiency in online reinforcement learning
Develops data attribution for PPO algorithm interpretability
Proposes iterative filtering to improve training efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local attribution framework for online RL
Gradient similarity measures data contribution
Iterative influence-based filtering improves training
🔎 Similar Papers
No similar papers found.