๐ค AI Summary
This work addresses the challenges of deploying conventional centralized Retrieval-Augmented Generation (RAG) in edge computing environments, where data privacy constraints, device heterogeneity, and the high cost of large language model (LLM) invocation hinder practicality. To overcome these limitations, the authors propose a federated dual-path RAG framework that constructs semantic-aware adaptive hypergraphs locally to encode knowledge structures and distills them into compact question-answering memory. LLMs are invoked only when necessary, decoupling lightweight retrieval from heavyweight reasoning. This approach uniquely integrates federated learning with dual-path RAG, incorporating hypergraph modeling and memory distillation to mitigate cross-device knowledge fragmentation while preserving privacy. Experiments demonstrate up to a 7.8% improvement in question-answering accuracy and an 8.4ร reduction in latency, with theoretical analysis establishing an ๐ช(1/ฮตยฒ) convergence rate for the hypergraph learning component.
๐ Abstract
Retrieval-augmented generation (RAG) has emerged as a paradigm for grounding large language models in external knowledge, yet most existing RAG systems assume centralized knowledge access and ample computation. These assumptions break down in edge environments, where knowledge is fragmented across devices, raw data cannot be shared, and repeated LLM calls are prohibitively expensive. We propose FD-RAG, a federated dual-system RAG framework that decouples lightweight memory access from on-demand LLM reasoning for decentralized deployment. Specifically, FD-RAG learns semantic-aware adaptive hypergraphs over local corpora and distills them into compact QA memories. At inference time, it answers well-covered queries via direct memory matching and invokes LLM-based reasoning only when necessary, while tracing retrieved memories to hypergraph-grounded evidence. To mitigate cross-device knowledge fragmentation, FD-RAG aggregates anonymized memories across devices without exposing raw documents. Experiments on QA benchmarks show that FD-RAG improves accuracy by up to 7.8\% while reducing latency by 8.4$\times$ compared with strong local and federated baselines. We also provide theoretical analysis establishing an $\mathcal{O}(1/ฮต^{2})$ convergence rate for the proposed hypergraph learning, supporting its tractable deployment in edge settings.