🤖 AI Summary
This work addresses the efficiency bottlenecks in retrieval-augmented generation (RAG)-based personalized large language models on edge devices, where growing data volumes strain computational resources and compute-in-memory (CiM) architectures suffer from environmental noise that degrades cross-domain retrieval accuracy and adaptability. To tackle these challenges, we propose TONEL—a task-oriented noise-robust embedding learning framework—that, for the first time, jointly models task specificity, noise robustness, and CiM hardware constraints. TONEL employs noise-aware projection learning to produce efficient and robust embedding representations. Experimental results demonstrate that TONEL significantly outperforms strong baselines across multiple personalization benchmarks, achieving high-accuracy, low-overhead domain-adaptive RAG, particularly under task-relevant noisy conditions.
📝 Abstract
Personalized virtual assistants powered by large language models (LLMs) on edge devices are attracting growing attention, with Retrieval-Augmented Generation (RAG) emerging as a key method for personalization by retrieving relevant profile data and generating tailored responses. However, deploying RAG on edge devices faces efficiency hurdles due to the rapid growth of profile data, such as user-LLM interactions and recent updates. While Computing-in-Memory (CiM) architectures mitigate this bottleneck by eliminating data movement between memory and processing units via in-situ operations, they are susceptible to environmental noise that can degrade retrieval precision. This poses a critical issue in dynamic, multi-domain edge-based scenarios (e.g., travel, medicine, and law) where both accuracy and adaptability are paramount. To address these challenges, we propose Task-Oriented Noise-resilient Embedding Learning (TONEL), a framework that improves noise robustness and domain adaptability for RAG in noisy edge environments. TONEL employs a noise-aware projection model to learn task-specific embeddings compatible with CiM hardware constraints, enabling accurate retrieval under noisy conditions. Extensive experiments conducted on personalization benchmarks demonstrate the effectiveness and practicality of our methods relative to strong baselines, especially in task-specific noisy scenarios.