Semantic Caching for Improving Web Affordability

๐Ÿ“… 2025-06-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Rising web scale exacerbates network burdens in developing countries, where high data costs constrain accessibility. To address this, we propose a large language model (LLM)-based semantic caching framework that transcends conventional exact-match caching by leveraging multimodal LLMs (GPT-4o and LLaMA 3.1) to assess semantic substitutability of imagesโ€”enabling reuse while preserving visual fidelity. Our end-to-end architecture integrates user-configurable policies and privacy-preserving mechanisms. Empirical evaluation across 50 mainstream news websites demonstrates up to 37% image substitutability for certain categories; a prototype implementation reduces transmission volume by approximately 10% compared to traditional caching. This work presents the first empirical validation of open-source multimodal LLMs for large-scale semantic caching, establishing a novel paradigm for enhancing web affordability and efficiency in resource-constrained environments.

Technology Category

Application Category

๐Ÿ“ Abstract
The rapid growth of web content has led to increasingly large webpages, posing significant challenges for Internet affordability, especially in developing countries where data costs remain prohibitively high. We propose semantic caching using Large Language Models (LLMs) to improve web affordability by enabling reuse of semantically similar images within webpages. Analyzing 50 leading news and media websites, encompassing 4,264 images and over 40,000 image pairs, we demonstrate potential for significant data transfer reduction, with some website categories showing up to 37% of images as replaceable. Our proof-of-concept architecture shows users can achieve approximately 10% greater byte savings compared to exact caching. We evaluate both commercial and open-source multi-modal LLMs for assessing semantic replaceability. GPT-4o performs best with a low Normalized Root Mean Square Error of 0.1735 and a weighted F1 score of 0.8374, while the open-source LLaMA 3.1 model shows comparable performance, highlighting its viability for large-scale applications. This approach offers benefits for both users and website operators, substantially reducing data transmission. We discuss ethical concerns and practical challenges, including semantic preservation, user-driven cache configuration, privacy concerns, and potential resistance from website operators
Problem

Research questions and friction points this paper is trying to address.

Reducing web data costs via semantic image caching
Evaluating LLMs for image replaceability in webpages
Addressing ethical and practical caching challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

Semantic caching using LLMs for web affordability
Reuse semantically similar images to reduce data
Multi-modal LLMs assess image replaceability effectively
๐Ÿ”Ž Similar Papers
H
Hafsa Akbar
Lahore University of Management Sciences, Pakistan
D
Danish Athar
Lahore University of Management Sciences, Pakistan
M
Muhammad Ayain Fida Rana
Lahore University of Management Sciences, Pakistan
C
Chaudhary Hammad Javed
Lahore University of Management Sciences, Pakistan
Z
Zartash Afzal Uzmi
Lahore University of Management Sciences, Pakistan
Ihsan Ayyub Qazi
Ihsan Ayyub Qazi
Full Professor of Computer Science, LUMS
Networked Systems (Digital DevelopmentMisinformationGenAIDigital Health)
Zafar Ayyub Qazi
Zafar Ayyub Qazi
Associate Professor, LUMS
Networked Systems