Co-Evolving LLM Decision and Skill Bank Agents for Long-Horizon Tasks

📅 2026-04-22

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

This work addresses the challenge that large language models (LLMs) struggle to sustain effective decision-making in long-horizon interactive tasks due to the absence of mechanisms for discovering, retaining, and reusing structured skills. To overcome this limitation, the authors propose COSPLAY, a novel framework that enables the co-evolution of a decision-making agent and a learnable skill library. The agent guides its actions using retrieved skills, while the library automatically discovers, refines, and optimizes reusable skills—along with their formal contracts—from unlabeled behavioral trajectories, establishing a closed loop of skill discovery, refinement, and utilization. Integrating LLMs with skill extraction, contract modeling, and retrieval-augmented decision-making, COSPLAY significantly outperforms four state-of-the-art LLM baselines across six game environments, achieving a 25.1% average reward improvement with an 8B-parameter model, and remains competitive in multi-agent social reasoning tasks.

Technology Category

Application Category

📝 Abstract

Long horizon interactive environments are a testbed for evaluating agents skill usage abilities. These environments demand multi step reasoning, the chaining of multiple skills over many timesteps, and robust decision making under delayed rewards and partial observability. Games are a good testbed for evaluating agent skill usage in environments. Large Language Models (LLMs) offer a promising alternative as game playing agents, but they often struggle with consistent long horizon decision making because they lack a mechanism to discover, retain, and reuse structured skills across episodes. We present COSPLAY, a co evolution framework in which an LLM decision agent retrieves skills from a learnable skill bank to guide action taking, while an agent managed skill pipeline discovers reusable skills from the agents unlabeled rollouts to form a skill bank. Our framework improves both the decision agent to learn better skill retrieval and action generation, while the skill bank agent continually extracts, refines, and updates skills together with their contracts. Experiments across six game environments show that COSPLAY with an 8B base model achieves over 25.1 percent average reward improvement against four frontier LLM baselines on single player game benchmarks while remaining competitive on multi player social reasoning games.

Problem

Research questions and friction points this paper is trying to address.

long-horizon tasks

skill reuse

large language models

decision making

partial observability

Innovation

Methods, ideas, or system contributions that make the work stand out.

co-evolution

skill bank

long-horizon decision making