A Decentralized Retrieval Augmented Generation System with Source Reliabilities Secured on Blockchain

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing centralized RAG systems suffer from high operational costs, privacy vulnerabilities, and—upon decentralization—heterogeneous data source reliability. To address these challenges, this paper proposes the first blockchain-based decentralized RAG framework. Our method integrates (1) a dynamic reliability scoring mechanism that assesses and weights multi-source retrieval in real time based on contribution quality; (2) smart contracts for transparent, tamper-proof, and decentralized management of score generation, updates, and verification; and (3) a hybrid architecture combining decentralized storage, batched state synchronization, and multi-source fusion retrieval using Llama-3B/8B models. Experiments under low-reliability simulated conditions demonstrate a 10.7% improvement in retrieval accuracy and generation quality, performance approaching that of centralized baselines, and a 56% reduction in marginal cost. The system is open-sourced.

Technology Category

Application Category

📝 Abstract
Existing retrieval-augmented generation (RAG) systems typically use a centralized architecture, causing a high cost of data collection, integration, and management, as well as privacy concerns. There is a great need for a decentralized RAG system that enables foundation models to utilize information directly from data owners who maintain full control over their sources. However, decentralization brings a challenge: the numerous independent data sources vary significantly in reliability, which can diminish retrieval accuracy and response quality. To address this, our decentralized RAG system has a novel reliability scoring mechanism that dynamically evaluates each source based on the quality of responses it contributes to generate and prioritizes high-quality sources during retrieval. To ensure transparency and trust, the scoring process is securely managed through blockchain-based smart contracts, creating verifiable and tamper-proof reliability records without relying on a central authority. We evaluate our decentralized system with two Llama models (3B and 8B) in two simulated environments where six data sources have different levels of reliability. Our system achieves a +10.7% performance improvement over its centralized counterpart in the real world-like unreliable data environments. Notably, it approaches the upper-bound performance of centralized systems under ideally reliable data environments. The decentralized infrastructure enables secure and trustworthy scoring management, achieving approximately 56% marginal cost savings through batched update operations. Our code and system are open-sourced at github.com/yining610/Reliable-dRAG.
Problem

Research questions and friction points this paper is trying to address.

Decentralized RAG systems face unreliable data sources affecting accuracy
Centralized RAG systems have high costs and privacy concerns
Managing source reliability transparently without central authority is challenging
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decentralized RAG system with blockchain-secured reliability scoring
Dynamic source evaluation based on response quality contributions
Smart contracts ensure transparent tamper-proof reliability records
🔎 Similar Papers
No similar papers found.