🤖 AI Summary
This work addresses the limitations of existing self-evolving memory systems, which struggle with continuous feedback and distribution shifts in real-world scenarios due to their reliance on static data partitioning. To overcome this, we propose an online self-evolving memory system that decouples experience storage from usage policy through an Experience Bank and a Meta-Guideline Bank. Coupled with a dynamic memory weighting mechanism driven by continuous feedback, our approach enables human-like reinforcement and forgetting. This is the first method to achieve online memory evolution over continuous data streams, incorporating experience weighting and decay strategies that significantly enhance long-term adaptability and transfer capability. Evaluated on the Prophet Arena benchmark, our model improves Brier scores by 20.8% and market returns by 12.9% over ten weeks, while substantially outperforming strong baselines in in-depth research tasks.
📝 Abstract
Large language model (LLM) agents are increasingly equipped with memory, which are stored experience and reusable guidance that can improve task-solving performance. Recent \emph{self-evolving} systems update memory based on interaction outcomes, but most existing evolution pipelines are developed for static train/test splits and only approximate online learning by folding static benchmarks, making them brittle under true distribution shift and continuous feedback. We introduce \textsc{Live-Evo}, an online self-evolving memory system that learns from a stream of incoming data over time. \textsc{Live-Evo} decouples \emph{what happened} from \emph{how to use it} via an Experience Bank and a Meta-Guideline Bank, compiling task-adaptive guidelines from retrieved experiences for each task. To manage memory online, \textsc{Live-Evo} maintains experience weights and updates them from feedback: experiences that consistently help are reinforced and retrieved more often, while misleading or stale experiences are down-weighted and gradually forgotten, analogous to reinforcement and decay in human memory. On the live \textit{Prophet Arena} benchmark over a 10-week horizon, \textsc{Live-Evo} improves Brier score by 20.8\% and increases market returns by 12.9\%, while also transferring to deep-research benchmarks with consistent gains over strong baselines. Our code is available at https://github.com/ag2ai/Live-Evo.