🤖 AI Summary
This work addresses the limitations of existing experience-driven learning approaches, which are largely confined to textual environments and rely on handcrafted, fixed memory structures that struggle to handle the entangled multimodal experiences of perception, reasoning, and action in real-world settings. The authors propose a novel paradigm that reframes memory not as a predefined component but as a learnable, adaptive process. Within an end-to-end trainable framework, multimodal representation learning is integrated with meta-learning, enabling agents to dynamically construct, organize, and retrieve memories based on task demands and interaction history. By explicitly treating the memory structure itself as a learning objective—a first in this domain—the method significantly enhances agent performance and generalization across diverse multimodal tasks, demonstrating the critical role of adaptive memory mechanisms in experience-driven learning.
📝 Abstract
Experience-driven learning has emerged as a promising paradigm for enabling agents to improve from interaction trajectories by accumulating and reusing past experience. However, existing approaches are predominantly developed in textual settings and rely on manually designed memory schemas, limiting their applicability to multimodal environments. In real-world scenarios, experience is inherently multimodal, involving heterogeneous signals across perception, reasoning, and action, which makes effective memory design significantly more challenging. In particular, the optimal way to structure and utilize multimodal experience is highly task-dependent and evolves over time, rendering fixed memory designs insufficient. In this work, we propose a new paradigm, learning to learn from multimodal experience, which shifts memory design from a predefined component to an adaptive and learnable process. Our framework enables agents to dynamically construct, organize, and utilize memory based on task requirements and interaction history, effectively learning how to structure experience for improved performance. Experiments demonstrate that adaptive memory design substantially enhances agent performance and generalization across multimodal tasks, highlighting the critical role of learning memory mechanisms in experience-driven learning.