Efficient Cloud-edge Collaborative Approaches to SPARQL Queries over Large RDF graphs

📅 2026-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance bottlenecks of traditional cloud architectures in processing large-scale RDF graph SPARQL queries under bandwidth-constrained or high-load conditions. The authors propose an efficient cloud-edge collaborative query processing framework that introduces, for the first time, a schema-induced subgraph-based data placement strategy. They formulate a mixed-integer nonlinear programming (MINLP) model that jointly optimizes query assignment and resource scheduling, and develop an enhanced branch-and-bound algorithm to solve it. Experimental evaluation on real-world datasets and cloud platforms demonstrates that the proposed approach significantly outperforms state-of-the-art baselines in query efficiency, thereby substantially improving RDF data management performance in edge environments.

Technology Category

Application Category

📝 Abstract
With the increasing use of RDF graphs, storing and querying such data using SPARQL remains a critical problem. Current mainstream solutions rely on cloud-based data management architectures, but often suffer from performance bottlenecks in environments with limited bandwidth or high system load. To address this issue, this paper explores for the first time the integration of edge computing to move graph data storage and processing to edge environments, thereby improving query performance. This approach requires offloading query processing to edge servers, which involves addressing two challenges: data localization and network scheduling. First, the data localization challenge lies in computing the subgraphs maintained on edge servers to quickly identify the servers that can handle specific queries. To address this challenge, we introduce a new concept of pattern-induced subgraphs. Second, the network scheduling challenge involves efficiently assigning queries to edge and cloud servers to optimize overall system performance. We tackle this by constructing a overall system model that jointly captures data distribution, query characteristics, network communication, and computational resources. Accordingly, we further propose a joint formulation of query assignment and computational resource allocation, modeling it as a Mixed Integer Nonlinear Programming (MINLP) problem and solve this problem using a modified branch-and-bound algorithm. Experimental results on real datasets under a real cloud platform demonstrate that our proposed method outperforms the state-of-the-art baseline methods in terms of efficiency. The codes are available on GitHub
Problem

Research questions and friction points this paper is trying to address.

SPARQL queries
RDF graphs
cloud-edge collaboration
query performance
data localization
Innovation

Methods, ideas, or system contributions that make the work stand out.

edge computing
SPARQL query
pattern-induced subgraphs
cloud-edge collaboration
MINLP
🔎 Similar Papers
No similar papers found.
S
Shidan Ma
College of Computer Science and Electronic Engineering, Hunan University, China
Peng Peng
Peng Peng
Hunan University
RDFDistributed DatabaseGraph Database
X
Xu Zhou
College of Computer Science and Electronic Engineering, Hunan University, China
M
M. T. Ozsu
University of Waterloo, Canada
Lei Zou
Lei Zou
Professor at Peking University
graph databaseRDF data managementKnowledge Graph
Guo Chen
Guo Chen
Computer Science and Technology, Tsinghua University
Speech SeparationArtificial Intelligence