π€ AI Summary
This work addresses the tendency of static large language models to repeatedly commit reasoning errors and the limitations of existing retrieval-based experience reuse methods, which suffer from noise, high latency, and reliance solely on similarity matching. To overcome these issues, the authors propose SEAMβa lightweight, executor-specific plug-in module that eschews conventional retrieval mechanisms by internalizing experience into learnable parameters. SEAM generates structured, instance-tailored experience entries via a single forward pass to guide frozen large language models toward improved reasoning. Trained with a utility-driven GRPO algorithm combined with executor rollouts and further refined through supervised fine-tuning on successful trajectory logs, SEAM enables post-deployment performance gains without modifying the main model. Experiments demonstrate significant accuracy improvements across multiple frozen executors on mathematical reasoning benchmarks, with minimal computational overhead, confirming the methodβs effectiveness and robustness.
π Abstract
Large language models (LLMs) are largely static and often redo reasoning or repeat mistakes. Prior experience reuse typically relies on external retrieval, which is similarity-based, can introduce noise, and adds latency. We introduce SEAM (Structured Experience Adapter Module), a lightweight, executor-specific plug-in that stores experience in its parameters and generates a structured, instance-tailored experience entry in a single forward pass to guide a frozen LLM executor. SEAM is trained for utility via executor rollouts and GRPO while keeping the executor frozen, and it can be further improved after deployment with supervised fine-tuning on logged successful trajectories. Experiments on mathematical reasoning benchmarks show consistent accuracy gains across executors with low overhead. Extensive ablations and analyses further elucidate the mechanisms underlying SEAM's effectiveness and robustness.