Online Experiential Learning for Language Models

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitation of current large language models, which rely on offline training and struggle to continuously improve using real-world interaction experiences after deployment. To overcome this, the authors propose Online Experience Learning (OEL), a novel framework that enables models to autonomously learn from their own interaction trajectories without accessing user environments. OEL employs a two-stage mechanism: first extracting transferable experiential knowledge, then integrating it into model parameters via policy-consistent contextual distillation, thereby establishing a closed-loop iterative learning process. Experiments demonstrate that OEL significantly enhances task accuracy and token efficiency across models of varying scales and text-based game environments, while preserving out-of-distribution generalization. Moreover, the distilled experiential knowledge consistently outperforms direct use of raw interaction trajectories.

Technology Category

Application Category

📝 Abstract
The prevailing paradigm for improving large language models relies on offline training with human annotations or simulated environments, leaving the rich experience accumulated during real-world deployment entirely unexploited. We propose Online Experiential Learning (OEL), a framework that enables language models to continuously improve from their own deployment experience. OEL operates in two stages: first, transferable experiential knowledge is extracted and accumulated from interaction trajectories collected on the user side; second, this knowledge is consolidated into model parameters via on-policy context distillation, requiring no access to the user-side environment. The two stages are iterated to form an online learning loop, where the improved model collects higher-quality trajectories that yield richer experiential knowledge for subsequent rounds. We evaluate OEL on text-based game environments across multiple model scales and both thinking and non-thinking variants. OEL achieves consistent improvements over successive iterations, enhancing both task accuracy and token efficiency while preserving out-of-distribution performance. Our analysis further shows that extracted experiential knowledge is significantly more effective than raw trajectories, and that on-policy consistency between the knowledge source and the policy model is critical for effective learning.
Problem

Research questions and friction points this paper is trying to address.

online learning
experiential learning
language models
deployment experience
knowledge extraction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online Experiential Learning
on-policy context distillation
experience extraction
continuous learning
language model adaptation
🔎 Similar Papers
No similar papers found.