From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

📅 2026-05-18
📈 Citations: 0
Influential: 0
📄 PDF

career value

220K/year
🤖 AI Summary
This work addresses the challenge of deploying large language model agents on resource-constrained devices, where limited memory impedes efficient storage of personalized context for retrieval-augmented generation (RAG). The authors propose EPIC, a novel approach that uniquely treats user preferences as the central criterion for memory construction throughout the entire RAG pipeline. By integrating preference-aligned indexing, selective information retention, and a streaming update mechanism, EPIC achieves highly aligned personalized RAG with minimal memory overhead. Experimental results demonstrate that, compared to baseline methods, EPIC reduces index memory by 2,404× and retrieval latency by 33.33× across four benchmark tasks, while improving preference adherence accuracy by 20.17 percentage points. On-device evaluations confirm a memory footprint under 1 MB and a per-query latency of only 29.35 milliseconds.
📝 Abstract
With the rapid emergence of personal AI agents based on Large Language Models (LLMs), implementing them on-device has become essential for privacy and responsiveness. To handle the inherently personal and context-dependent nature of real-world requests, such agents must ground their generation in device-resident personal context. However, under tight memory budgets, the core bottleneck is what to store so that retrieval remains aligned with the user. We propose EPIC (Efficient Preference-aligned Index Construction), which focuses on user preferences as a compact and stable form of personal context and integrates them throughout the RAG pipeline. EPIC selectively retains preference-relevant information from raw data and aligns retrieval toward preference-aligned contexts. Across four benchmarks covering conversations, debates, explanations, and recommendations, EPIC reduces indexing memory by 2,404 times, improves preference-following accuracy by 20.17 percentage points, and achieves 33.33 times lower retrieval latency over the best-performing baseline. In our on-device experiment, EPIC maintains a memory footprint under 1 MB with 29.35 ms/query latency in streaming updates.
Problem

Research questions and friction points this paper is trying to address.

on-device RAG
preference alignment
memory efficiency
personal AI agents
retrieval grounding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preference-aligned RAG
On-device AI
Memory-efficient indexing
Personalized LLM agents
EPIC
🔎 Similar Papers
No similar papers found.