A Grounded Memory System For Smart Personal Assistants

📅 2025-05-09

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

To address the weak long-term memory and poor interpretability of intelligent personal assistants in real-world scenarios, this paper proposes the first embodied memory system framework. It integrates vision-language models (VLMs) with large language models (LLMs) to enable multimodal perception and structured information extraction; jointly constructs a unified memory representation combining knowledge graphs and vector embeddings to support retrieval-augmented question answering driven by both semantic search and graph querying. Its key innovation lies in the first integration of VLM-based understanding, graph-based knowledge modeling, and vector memory within a closed-loop embodied memory architecture—achieving seamless integration of perception, memory, and reasoning. Experiments on real-world cases demonstrate significant improvements in temporal event memory consistency, relational traceability, and complex QA accuracy. The framework provides verifiable, interpretable long-term memory support for high-reliability cognitive assistance applications.

Technology Category

Application Category

📝 Abstract

A wide variety of agentic AI applications - ranging from cognitive assistants for dementia patients to robotics - demand a robust memory system grounded in reality. In this paper, we propose such a memory system consisting of three components. First, we combine Vision Language Models for image captioning and entity disambiguation with Large Language Models for consistent information extraction during perception. Second, the extracted information is represented in a memory consisting of a knowledge graph enhanced by vector embeddings to efficiently manage relational information. Third, we combine semantic search and graph query generation for question answering via Retrieval Augmented Generation. We illustrate the system's working and potential using a real-world example.

Problem

Research questions and friction points this paper is trying to address.

Develop a robust memory system for AI agents

Integrate vision and language models for perception

Enhance knowledge graphs with vector embeddings

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines Vision and Language Models for perception

Uses knowledge graph with vector embeddings

Employs Retrieval Augmented Generation for QA

🔎 Similar Papers

No similar papers found.