SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

📅 2026-04-02

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This work addresses the limitations of large language model (LLM) agents that rely on external skill retrieval during reasoning—namely susceptibility to noise, high contextual overhead, and difficulty in internalizing knowledge. To overcome these challenges, the authors propose a context-based reinforcement learning framework for skill internalization, featuring a dynamic curriculum mechanism that progressively removes skill prompts during training. Key skills are selected based on policy returns, and their usage budget is linearly decayed, enabling a smooth transition from full-context reliance to zero-shot execution. The approach integrates offline skill categorization, visual encoding of interaction histories, and a dynamic scheduling policy. Evaluated on ALFWorld and Search-QA, it outperforms standard reinforcement learning baselines by 9.7% and 6.6%, respectively, while maintaining per-step context usage below 0.5k tokens, thereby significantly enhancing both performance and efficiency.

Technology Category

Application Category

📝 Abstract

Agent skills, structured packages of procedural knowledge and executable resources that agents dynamically load at inference time, have become a reliable mechanism for augmenting LLM agents. Yet inference-time skill augmentation is fundamentally limited: retrieval noise introduces irrelevant guidance, injected skill content imposes substantial token overhead, and the model never truly acquires the knowledge it merely follows. We ask whether skills can instead be internalized into model parameters, enabling zero-shot autonomous behavior without any runtime skill retrieval. We introduce SKILL0, an in-context reinforcement learning framework designed for skill internalization. SKILL0 introduces a training-time curriculum that begins with full skill context and progressively withdraws it. Skills are grouped offline by category and rendered with interaction history into a compact visual context, teaching he model tool invocation and multi-turn task completion. A Dynamic Curriculum then evaluates each skill file's on-policy helpfulness, retaining only those from which the current policy still benefits within a linearly decaying budget, until the agent operates in a fully zero-shot setting. Extensive agentic experiments demonstrate that SKILL0 achieves substantial improvements over the standard RL baseline (+9.7\% for ALFWorld and +6.6\% for Search-QA), while maintaining a highly efficient context of fewer than 0.5k tokens per step. Our code is available at https://github.com/ZJU-REAL/SkillZero.

Problem

Research questions and friction points this paper is trying to address.

skill internalization

in-context reinforcement learning

zero-shot agentic behavior

LLM agents

runtime skill retrieval

Innovation

Methods, ideas, or system contributions that make the work stand out.

skill internalization

in-context reinforcement learning

zero-shot agentic behavior