Episodic Memory Verbalization using Hierarchical Representations of Life-Long Robot Experience

📅 2024-09-26

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This work addresses the challenge of verbalizing robots’ long-term experience—spanning months of multimodal perceptual and proprioceptive data—which hinders natural, interpretable human-robot interaction. We propose a lifelong experience verbalization framework that synergistically integrates a hierarchical memory tree (perception → event → linguistic concept) with large language models (LLMs). Methodologically, the tree-structured semantic representation enables temporally coherent organization of cross-episode experiences; LLM-driven dynamic retrieval, coupled with zero-/few-shot prompting, supports efficient summarization and interactive question answering over long-term memory. Our approach is the first to deeply unify hierarchical memory modeling with LLM-based retrieval, overcoming the limitations of conventional short-horizon, fragment-based memory architectures. Evaluated on simulated household robots, egocentric video datasets, and real-robot deployment data, the framework achieves significant improvements in low-overhead, high-fidelity verbalization of multi-month experiences.

Technology Category

Application Category

📝 Abstract

Verbalization of robot experience, i.e., summarization of and question answering about a robot's past, is a crucial ability for improving human-robot interaction. Previous works applied rule-based systems or fine-tuned deep models to verbalize short (several-minute-long) streams of episodic data, limiting generalization and transferability. In our work, we apply large pretrained models to tackle this task with zero or few examples, and specifically focus on verbalizing life-long experiences. For this, we derive a tree-like data structure from episodic memory (EM), with lower levels representing raw perception and proprioception data, and higher levels abstracting events to natural language concepts. Given such a hierarchical representation built from the experience stream, we apply a large language model as an agent to interactively search the EM given a user's query, dynamically expanding (initially collapsed) tree nodes to find the relevant information. The approach keeps computational costs low even when scaling to months of robot experience data. We evaluate our method on simulated household robot data, human egocentric videos, and real-world robot recordings, demonstrating its flexibility and scalability.

Problem

Research questions and friction points this paper is trying to address.

Verbalizing lifelong robot experiences for human-robot interaction

Overcoming limitations of rule-based systems in episodic memory summarization

Scaling hierarchical memory representation for efficient query processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical tree-like data structure for EM

Large pretrained models for zero/few-shot verbalization

Dynamic tree node expansion for efficient querying

🔎 Similar Papers

Task-unaware Lifelong Robot Learning with Retrieval-based Weighted Local Adaptation