Budget-Constrained Online Retrieval-Augmented Generation: The Chunk-as-a-Service Model

📅 2026-04-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This study addresses the lack of transparency and cost inefficiency in existing Retrieval-Augmented Generation-as-a-Service (RaaS) billing models, which fail to account for the relevance and actual utilization of retrieved chunks. To overcome these limitations, the authors propose Chunk-as-a-Service (CaaS), a novel framework featuring a transparent, chunk-level billing mechanism, along with an online Utility-Cost Optimal Selection Algorithm (UCOSA) that dynamically selects high-value chunks under budget constraints to enhance large language model outputs. Experimental results demonstrate that UCOSA improves performance by approximately 52% over random selection and achieves 75% of the offline optimal performance. Furthermore, two CaaS variants—LB-CaaS and OB-CaaS—exhibit performance-to-budget ratios that surpass conventional RaaS by 140% and 86%, respectively.
📝 Abstract
Large Language Models (LLMs) have revolutionized the field of natural language processing. However, they exhibit some limitations, including a lack of reliability and transparency: they may hallucinate and fail to provide sources that support the generated output. Retrieval-Augmented Generation (RAG) was introduced to address such limitations in LLMs. One popular implementation, RAG-as-a-Service (RaaS), has shortcomings that hinder its adoption and accessibility. For instance, RaaS pricing is based on the number of submitted prompts, without considering whether the prompts are enriched by relevant chunks, i.e., text segments retrieved from a vector database, or the quality of the utilized chunks (i.e., their degree of relevance). This results in an opaque and less cost-effective payment model. We propose Chunk-as-a-Service (CaaS) as a transparent and cost-effective alternative. CaaS includes two variants: Open-Budget CaaS (OB-CaaS) and Limited-Budget CaaS (LB-CaaS), which is enabled by our ``Utility-Cost Online Selection Algorithm (UCOSA)''. UCOSA further extends the cost-effectiveness and the accessibility of the OB-CaaS variant by enriching, in an online manner, a subset of the submitted prompts based on budget constraints and utility-cost tradeoff. Our experiments demonstrate the efficacy of the proposed UCOSA compared to both offline and relevance-greedy selection baselines. In terms of the performance metric-the number of enriched prompts (NEP) multiplied by the Average Relevance (AR)-UCOSA outperforms random selection by approximately 52% and achieves around 75% of the performance of offline selection methods. Additionally, in terms of budget utilization, LB-CaaS and OB-CaaS achieve higher performance-to-budget ratios of 140% and 86%, respectively, compared to RaaS, indicating their superior efficiency.
Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation
Budget-Constrained
Cost-Effectiveness
Chunk Selection
Online Retrieval
Innovation

Methods, ideas, or system contributions that make the work stand out.

Chunk-as-a-Service
Retrieval-Augmented Generation
online selection algorithm
budget-constrained optimization
utility-cost tradeoff
🔎 Similar Papers
No similar papers found.
S
Shawqi Al-Maliki
Information and Computing Technology (ICT) Division, College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar
A
Ammar Gharaibeh
School of Electrical Engineering and Information Technology, German Jordanian University, Amman, Jordan
Mohamed Rahouti
Mohamed Rahouti
Fordham University
Computer networking and securityblockchain technologyAI and machine learning
M
Mohammad Ruhul Amin
Department of Computer and Information Sciences, Fordham University, Bronx, NY, USA
Mohamed Abdallah
Mohamed Abdallah
Professor and Associate Dean, Hamad Bin Khalifa University
Wireless CommunicationsEdge AIAutonomous VehiclesWireless SecuritySmart Grids
Junaid Qadir
Junaid Qadir
Professor of Computer Engineering, Qatar University
Human-centered AIAI EthicsEngineering EducationAI in EducationHealthcare AI
Ala Al-Fuqaha
Ala Al-Fuqaha
Hamad Bin Khalifa University (CSE-ICT) and Western Michigan University
Internet of ThingsSafe AISmart ServicesNetwork ManagementComputer Networks