🤖 AI Summary
This work addresses the challenge of enabling AI systems to dynamically adapt their inference-time strategies based on interactive experience—without requiring offline retraining or compromising deployment stability. We propose an experience-guided, inference-time strategy adaptation framework built upon a dual-component meta-policy architecture (Guide and Consolidator), integrating structured memory modeling, feedback-driven candidate policy generation, and end-to-end tunable mechanisms for prompt engineering, sampling parameter selection, tool configuration, and control logic synthesis. Crucially, this is the first approach to introduce meta-policy learning exclusively during inference, supporting autonomous switching between agent and workflow modes and full-pipeline dynamic reconfiguration. Evaluated on five high-difficulty benchmarks, our method achieves up to 14% absolute accuracy improvement and reduces computational cost by up to 111×, with performance consistently improving as interaction experience accumulates.
📝 Abstract
Enabling agentic AI systems to adapt their problem-solving approaches based on post-training interactions remains a fundamental challenge. While systems that update and maintain a memory at inference time have been proposed, existing designs only steer the system by modifying textual input to a language model or agent, which means that they cannot change sampling parameters, remove tools, modify system prompts, or switch between agentic and workflow paradigms. On the other hand, systems that adapt more flexibly require offline optimization and remain static once deployed. We present Experience-Guided Reasoner (EGuR), which generates tailored strategies -- complete computational procedures involving LLM calls, tools, sampling parameters, and control logic -- dynamically at inference time based on accumulated experience. We achieve this using an LLM-based meta-strategy -- a strategy that outputs strategies -- enabling adaptation of all strategy components (prompts, sampling parameters, tool configurations, and control logic). EGuR operates through two components: a Guide generates multiple candidate strategies conditioned on the current problem and structured memory of past experiences, while a Consolidator integrates execution feedback to improve future strategy generation. This produces complete, ready-to-run strategies optimized for each problem, which can be cached, retrieved, and executed as needed without wasting resources. Across five challenging benchmarks (AIME 2025, 3-SAT, and three Big Bench Extra Hard tasks), EGuR achieves up to 14% accuracy improvements over the strongest baselines while reducing computational costs by up to 111x, with both metrics improving as the system gains experience.