MetaClaw: Just Talk -- An Agent That Meta-Learns and Evolves in the Wild

📅 2026-03-17

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

This work addresses the challenge that existing large language model (LLM) agents struggle to dynamically adapt to evolving user demands in continuous service settings, often hindered by static skill sets, insufficient knowledge distillation, or the need for disruptive downtime updates. To overcome these limitations, we propose a continual meta-learning framework that enables zero-downtime online evolution through skill-driven rapid adaptation and an opportunistic window-based strategy optimization. Our key innovations include a bidirectional enhancement mechanism integrating skill composition with policy gradients, an Opportunistic Meta-Learning Scheduler (OMLS), and data version control. Evaluated on MetaClaw-Bench and AutoResearchClaw, our approach boosts Kimi-K2.5’s accuracy from 21.4% to 40.6%—a relative improvement of up to 32%—and enhances composite robustness by 18.3%.

Technology Category

Application Category

📝 Abstract

Large language model (LLM) agents are increasingly used for complex tasks, yet deployed agents often remain static, failing to adapt as user needs evolve. This creates a tension between the need for continuous service and the necessity of updating capabilities to match shifting task distributions. On platforms like OpenClaw, which handle diverse workloads across 20+ channels, existing methods either store raw trajectories without distilling knowledge, maintain static skill libraries, or require disruptive downtime for retraining. We present MetaClaw, a continual meta-learning framework that jointly evolves a base LLM policy and a library of reusable behavioral skills. MetaClaw employs two complementary mechanisms. Skill-driven fast adaptation analyzes failure trajectories via an LLM evolver to synthesize new skills, enabling immediate improvement with zero downtime. Opportunistic policy optimization performs gradient-based updates via cloud LoRA fine-tuning and Reinforcement Learning with a Process Reward Model (RL-PRM). This is triggered during user-inactive windows by the Opportunistic Meta-Learning Scheduler (OMLS), which monitors system inactivity and calendar data. These mechanisms are mutually reinforcing: a refined policy generates better trajectories for skill synthesis, while richer skills provide higher-quality data for policy optimization. To prevent data contamination, a versioning mechanism separates support and query data. Built on a proxy-based architecture, MetaClaw scales to production-size LLMs without local GPUs. Experiments on MetaClaw-Bench and AutoResearchClaw show that skill-driven adaptation improves accuracy by up to 32% relative. The full pipeline advances Kimi-K2.5 accuracy from 21.4% to 40.6% and increases composite robustness by 18.3%. Code is available at https://github.com/aiming-lab/MetaClaw.

Problem

Research questions and friction points this paper is trying to address.

continual adaptation

large language model agents

evolving user needs

task distribution shift

static skill libraries

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual meta-learning

skill-driven adaptation

opportunistic policy optimization