SLEA-RL: Step-Level Experience Augmented Reinforcement Learning for Multi-Turn Agentic Training

📅 2026-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of existing experience-augmented methods in multi-turn agent training, which rely on static task descriptions and struggle to adapt to dynamic observational changes. To overcome this, we propose SLEA-RL, a framework that dynamically retrieves relevant experiences at each decision step based on the current observation. SLEA-RL integrates structurally equivalent state clustering, a semantic-driven self-evolving experience repository, and a step-level credit assignment mechanism coupled with advantage estimation to enable gradient-free policy optimization. Notably, it is the first approach to support per-step dynamic experience retrieval in multi-turn tasks and introduces a scoring-based admission control and rate-limited extraction mechanism to enhance experience quality. Experiments demonstrate that SLEA-RL significantly outperforms multiple reinforcement learning baselines on long-horizon, multi-turn agent benchmarks.

Technology Category

Application Category

📝 Abstract
Large Language Model (LLM) agents have shown strong results on multi-turn tool-use tasks, yet they operate in isolation during training, failing to leverage experiences accumulated across episodes. Existing experience-augmented methods address this by organizing trajectories into retrievable libraries, but they retrieve experiences only once based on the initial task description and hold them constant throughout the episode. In multi-turn settings where observations change at every step, this static retrieval becomes increasingly mismatched as episodes progress. We propose SLEA-RL (Step-Level Experience-Augmented Reinforcement Learning), a framework that retrieves relevant experiences at each decision step conditioned on the current observation. SLEA-RL operates through three components: (i) step-level observation clustering that groups structurally equivalent environmental states for efficient cluster-indexed retrieval; (ii) a self-evolving experience library that distills successful strategies and failure patterns through score-based admission and rate-limited extraction; and (iii) policy optimization with step-level credit assignment for fine-grained advantage estimation across multi-turn episodes. The experience library evolves alongside the policy through semantic analysis rather than gradient updates. Experiments on long-horizon multi-turn agent benchmarks demonstrate that SLEA-RL achieves superior performance compared to various reinforcement learning baselines.
Problem

Research questions and friction points this paper is trying to address.

multi-turn agentic training
experience augmentation
step-level retrieval
reinforcement learning
large language model agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

step-level retrieval
experience-augmented reinforcement learning
self-evolving experience library
observation clustering
credit assignment