Recursive Question Understanding for Complex Question Answering over Heterogeneous Personal Data

📅 2025-05-17

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the need for localized, complex question answering over heterogeneous on-device personal data (e.g., calendars, fitness logs, shopping records, streaming histories), while preserving user privacy. Method: We propose the first fully on-device, privacy-preserving QA framework. It introduces a novel recursive question decomposition mechanism that generates executable operator trees to uniformly process both structured and unstructured data. The framework comprises a language-model-based semantic parser, a lightweight operator-driven execution engine, and cross-modal techniques for colloquial representation and alignment of multimodal personal data. Contribution/Results: We establish PerQA—the first benchmark for personal-data QA built upon realistic user profiles. On PerQA, our framework significantly outperforms baselines, enabling high-accuracy, auditable, multi-hop analytical queries with minimal on-device resource consumption—entirely without data upload.

Technology Category

Application Category

📝 Abstract

Question answering over mixed sources, like text and tables, has been advanced by verbalizing all contents and encoding it with a language model. A prominent case of such heterogeneous data is personal information: user devices log vast amounts of data every day, such as calendar entries, workout statistics, shopping records, streaming history, and more. Information needs range from simple look-ups to queries of analytical nature. The challenge is to provide humans with convenient access with small footprint, so that all personal data stays on the user devices. We present ReQAP, a novel method that creates an executable operator tree for a given question, via recursive decomposition. Operators are designed to enable seamless integration of structured and unstructured sources, and the execution of the operator tree yields a traceable answer. We further release the PerQA benchmark, with persona-based data and questions, covering a diverse spectrum of realistic user needs.

Problem

Research questions and friction points this paper is trying to address.

Answering complex questions over heterogeneous personal data

Integrating structured and unstructured sources seamlessly

Providing traceable answers with small computational footprint

Innovation

Methods, ideas, or system contributions that make the work stand out.

Recursive decomposition for executable operator tree

Seamless integration of structured and unstructured sources

Traceable answer via operator tree execution

🔎 Similar Papers

OmniQuery: Contextually Augmenting Captured Multimodal Memory to Enable Personal Question Answering