๐ค AI Summary
This work addresses the fundamental trade-off faced by personal LLM agents among privacy preservation, inference cost, and task performanceโwhere cloud-based models risk leaking sensitive context, while local models suffer from low reliability and high prompt overhead. To overcome this, the authors propose a constant-context skill learning framework that, for the first time, shifts repetitive task procedures from prompts into model weights. This is achieved through lightweight quantum-inspired modules that learn reusable skills, enabling inference based solely on the current observation and a compact state block. A deterministic state tracker generates aligned subgoal rewards, and the system is jointly optimized via stepwise supervised fine-tuning and online reinforcement learning. Evaluated on ALFWorld, WebShop, and SciWorld, the approach significantly outperforms baselines, achieving success rates of 89.6%, 76.8%, and 66.4% with Qwen3-8B while reducing per-turn prompt tokens by 2โ7ร.
๐ Abstract
Large language model (LLM) agents are increasingly used to operate browsers, files, code and tools, making personal assistants a natural deployment target. Yet personal agents face a privacy-cost-capability tension: cloud models execute multi-step workflows well but expose sensitive intermediate context to external APIs, while local models preserve privacy but remain less reliable. Both settings also pay repeatedly for long skill prompts and growing histories. We propose constant-context skill learning, a context-to-weights framework for recurring agent workflows: reusable procedures are learned in lightweight task-family modules, while inference conditions only on the current observation and a compact state block. A deterministic tracker renders this state block from task progress and supplies aligned subgoal rewards, so each module can be trained with step-level SFT and refined through online RL. Across ALFWorld, WebShop, and SciWorld, our agents achieve strong performance across Qwen3-4B, Qwen3-8B and Llama-3.1-8B. With Qwen3-8B, SFT+RL reaches 89.6\% unseen success on ALFWorld, 76.8\% success on WebShop, and 66.4\% unseen success on SciWorld. They match or exceed strong published agent-training results while reducing prompt tokens per turn by 2--7$\times$ relative to controlled ReAct prompting baselines, showing that procedural context can be moved from prompts into weights.