Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the limitation of conventional reinforcement learning agents, which typically assume a static and fixed skill set, thereby struggling to dynamically invoke and adapt external skills in complex tasks. The authors propose SLIM, a novel framework that formulates skill lifecycle management as a dynamic optimization problem. SLIM evaluates the marginal contribution of each skill via leave-one-skill-out validation and jointly optimizes this assessment with the policy to dynamically retain, discard, or expand skills. This approach breaks from the traditional paradigm where skills are either permanently retained or fully internalized, enabling a task-adaptive skill scheduling mechanism. Experiments on the ALFWorld and SearchQA benchmarks demonstrate that SLIM outperforms the best baseline by an average of 7.1 percentage points, confirming the synergistic benefit of co-optimizing dynamic skill management with policy learning.

📝 Abstract

Large language model agents increasingly rely on external skills to solve complex tasks, where skills act as modular units that extend their capabilities beyond what parametric memory alone supports. Existing methods assume external skills either accumulate as persistent guidance or internalized into the policy, eventually leading to zero-skill inference. We argue this assumption is overly restrictive, since with limited parametric capacity and uneven marginal contribution across skills, the optimal active skill set is non-monotonic, task- and stage-dependent. In this work, we propose SLIM, a framework of dynamic Skill LIfecycle Management for agentic reinforcement learning (RL), which treats the active external skill set as a dynamic optimization variable jointly updated with policy learning. Specifically, SLIM estimates each active skill's marginal external contribution through leave-one-skill-out validation, then applies three lifecycle operations: retaining high-value skills, retiring skills whose contribution becomes negligible after sufficient exposure, and expanding the skill bank when persistent failures reveal missing capability coverage. Experiments show that SLIM outperforms the best baselines by an average of 7.1% points across ALFWorld and SearchQA. Results further indicate that policy learning and external skill retention are not mutually exclusive: some skills are absorbed into the policy, while others continue to provide external value, supporting SLIM as a more general paradigm for skill-based agentic RL.

Problem

Research questions and friction points this paper is trying to address.

Skill Lifecycle

Agentic Reinforcement Learning

External Skills

Dynamic Skill Management

Marginal Contribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Skill Lifecycle Management

Agentic Reinforcement Learning

Dynamic Skill Selection