PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the inefficiency, susceptibility to forgetting, and challenges in continual learning faced by embodied agents in complex environments due to reliance on retrieval-based memory. To overcome these limitations, the authors propose a parameterized memory framework that internalizes experiences as transferable skills encoded directly into model parameters. The approach integrates a slow large language model for high-level task reasoning with a fast multimodal Mixture-of-Experts LoRA module for action execution. Crucially, it leverages failure-correction trajectory pairs as a core learning signal, enabling efficient skill acquisition through contrastive behavioral cloning. The framework further incorporates a parameterized value scoring mechanism and a scale-free self-triggering strategy to facilitate cross-task, self-evolving memory consolidation, effectively mitigating catastrophic forgetting. Evaluated on the Minecraft benchmark, the method significantly improves both success rates and execution efficiency on long-horizon tasks, outperforming existing retrieval-based and parameterized memory approaches.

📝 Abstract

We present PEAM, a Parametric Embodied Agent Memory framework in Minecraft that transforms agent memory from inference-time retrieval into parameter-resident skills internalized through experience. PEAM pairs a slow deliberative LLM for open-ended reasoning with a fast parametric module for reflexive execution of consolidated skills. The fast module is a multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, enabling parameter-level continual learning without catastrophic forgetting. We treat failure as a first-class training signal: failure--correction trajectory pairs are internalized through a joint behavioral-cloning and contrastive objective, so the agent learns not only what succeeds but also how corrected actions differ from failed ones. To govern consolidation, PEAM introduces a parameterization-worthiness score for deciding which experience should be internalized, and a scale-free self-triggered consolidation mechanism for deciding when to internalize without task-specific hand-tuned thresholds, making the agent self-evolving as the trigger transfers across task distributions without re-tuning. Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting on previously consolidated skills, and improves parametric-versus-retrieval efficiency over retrieval-based embodied agents and parametric memory variants.

Problem

Research questions and friction points this paper is trying to address.

embodied agent memory

parametric memory

continual learning

catastrophic forgetting

experience internalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parametric Memory

Contrastive Internalization

Mixture-of-Experts LoRA