Episodic Memory Verbalization using Hierarchical Representations of Life-Long Robot Experience

📅 2024-09-26
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of verbalizing robots’ long-term experience—spanning months of multimodal perceptual and proprioceptive data—which hinders natural, interpretable human-robot interaction. We propose a lifelong experience verbalization framework that synergistically integrates a hierarchical memory tree (perception → event → linguistic concept) with large language models (LLMs). Methodologically, the tree-structured semantic representation enables temporally coherent organization of cross-episode experiences; LLM-driven dynamic retrieval, coupled with zero-/few-shot prompting, supports efficient summarization and interactive question answering over long-term memory. Our approach is the first to deeply unify hierarchical memory modeling with LLM-based retrieval, overcoming the limitations of conventional short-horizon, fragment-based memory architectures. Evaluated on simulated household robots, egocentric video datasets, and real-robot deployment data, the framework achieves significant improvements in low-overhead, high-fidelity verbalization of multi-month experiences.

Technology Category

Application Category

📝 Abstract
Verbalization of robot experience, i.e., summarization of and question answering about a robot's past, is a crucial ability for improving human-robot interaction. Previous works applied rule-based systems or fine-tuned deep models to verbalize short (several-minute-long) streams of episodic data, limiting generalization and transferability. In our work, we apply large pretrained models to tackle this task with zero or few examples, and specifically focus on verbalizing life-long experiences. For this, we derive a tree-like data structure from episodic memory (EM), with lower levels representing raw perception and proprioception data, and higher levels abstracting events to natural language concepts. Given such a hierarchical representation built from the experience stream, we apply a large language model as an agent to interactively search the EM given a user's query, dynamically expanding (initially collapsed) tree nodes to find the relevant information. The approach keeps computational costs low even when scaling to months of robot experience data. We evaluate our method on simulated household robot data, human egocentric videos, and real-world robot recordings, demonstrating its flexibility and scalability.
Problem

Research questions and friction points this paper is trying to address.

Verbalizing lifelong robot experiences for human-robot interaction
Overcoming limitations of rule-based systems in episodic memory summarization
Scaling hierarchical memory representation for efficient query processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical tree-like data structure for EM
Large pretrained models for zero/few-shot verbalization
Dynamic tree node expansion for efficient querying
🔎 Similar Papers
No similar papers found.
L
Leonard Barmann
Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Germany
Chad DeChant
Chad DeChant
PhD student, Computer Science, Columbia University
natural language processingroboticsAI safety and policy
J
Joana Plewnia
Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Germany
F
Fabian Peller-Konrad
Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Germany
Daniel Bauer
Daniel Bauer
University of North Carolina
Tamim Asfour
Tamim Asfour
Karlsruhe Institute of Technology (KIT)
Humanoid RoboticsHumanoid Robots
A
Alexander H. Waibel
Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Germany