🤖 AI Summary
In multi-turn human-agent collaboration, user needs are dynamic and uncertain, posing challenges for existing gradient-based agent methods, which suffer from high training costs, difficult credit assignment, and poor interpretability. To address these issues, this work proposes PRIME, a novel framework that introduces a parameter-free active inference mechanism. PRIME structures interaction trajectories into three types of semantic experiences—successful strategies, failure patterns, and user preferences—through gradient-free iterative memory evolution. By integrating retrieval-augmented generation with explicit experience accumulation, PRIME enables efficient and interpretable policy refinement via meta-level memory operations. Experimental results demonstrate that PRIME achieves performance on par with gradient-based methods across multiple user-centric tasks while substantially reducing computational overhead and enhancing decision transparency.
📝 Abstract
The development of autonomous tool-use agents for complex, long-horizon tasks in collaboration with human users has become the frontier of agentic research. During multi-turn Human-AI interactions, the dynamic and uncertain nature of user demands poses a significant challenge; agents must not only invoke tools but also iteratively refine their understanding of user intent through effective communication. While recent advances in reinforcement learning offer a path to more capable tool-use agents, existing approaches require expensive training costs and struggle with turn-level credit assignment across extended interaction horizons. To this end, we introduce PRIME (Proactive Reasoning via Iterative Memory Evolution), a gradient-free learning framework that enables continuous agent evolvement through explicit experience accumulation rather than expensive parameter optimization. PRIME distills multi-turn interaction trajectories into structured, human-readable experiences organized across three semantic zones: successful strategies, failure patterns, and user preferences. These experiences evolve through meta-level operations and guide future agent behavior via retrieval-augmented generation. Our experiments across several diverse user-centric environments demonstrate that PRIME achieves competitive performance with gradient-based methods while offering cost-efficiency and interpretability. Together, PRIME presents a practical paradigm for building proactive, collaborative agents that learn from Human-AI interaction without the computational burden of gradient-based training.