OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering

📅 2024-09-12
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI tools support only natural-language retrieval of isolated objects from single images, screenshots, or videos, failing to address complex personal memory queries requiring cross-temporal reasoning over sequential, semantically linked events. This work proposes a context-aware memory augmentation framework that aggregates fragmented yet semantically and temporally coherent multimodal memory fragments (e.g., photos, screenshots, videos) into contextualized memory units. It integrates multi-source memory retrieval, large language model (LLM)-driven answer generation, and traceable citation mechanisms, grounded in a memory context taxonomy derived from real user diaries. The approach overcomes limitations of conventional single-point retrieval and static retrieval-augmented generation (RAG). In human evaluation, it achieves 71.5% accuracy and outperforms baseline RAG systems on 74.5% of test cases.

Technology Category

Application Category

📝 Abstract
People often capture memories through photos, screenshots, and videos. While existing AI-based tools enable querying this data using natural language, they only support retrieving individual pieces of information like certain objects in photos, and struggle with answering more complex queries that involve interpreting interconnected memories like sequential events. We conducted a one-month diary study to collect realistic user queries and generated a taxonomy of necessary contextual information for integrating with captured memories. We then introduce OmniQuery, a novel system that is able to answer complex personal memory-related questions that require extracting and inferring contextual information. OmniQuery augments individual captured memories through integrating scattered contextual information from multiple interconnected memories. Given a question, OmniQuery retrieves relevant augmented memories and uses a large language model (LLM) to generate answers with references. In human evaluations, we show the effectiveness of OmniQuery with an accuracy of 71.5%, outperforming a conventional RAG system by winning or tying for 74.5% of the time.
Problem

Research questions and friction points this paper is trying to address.

Enables answering complex personal memory queries
Integrates contextual information from interconnected memories
Uses LLM for generating accurate memory-based answers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Augments individual captured memories
Integrates scattered contextual information
Uses large language model for answers
🔎 Similar Papers
No similar papers found.
J
Jiahao Nick Li
UCLA, Los Angeles, USA
Z
Zhuohao Zhang
University of Washington, Seattle, USA
Jiaju Ma
Jiaju Ma
Stanford University
AnimationComputer GraphicsHCIComputer Vision