Efficient Cloud-edge Collaborative Approaches to SPARQL Queries over Large RDF graphs

📅 2026-01-22

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the performance bottlenecks of traditional cloud architectures in processing large-scale RDF graph SPARQL queries under bandwidth-constrained or high-load conditions. The authors propose an efficient cloud-edge collaborative query processing framework that introduces, for the first time, a schema-induced subgraph-based data placement strategy. They formulate a mixed-integer nonlinear programming (MINLP) model that jointly optimizes query assignment and resource scheduling, and develop an enhanced branch-and-bound algorithm to solve it. Experimental evaluation on real-world datasets and cloud platforms demonstrates that the proposed approach significantly outperforms state-of-the-art baselines in query efficiency, thereby substantially improving RDF data management performance in edge environments.

Technology Category

Application Category

📝 Abstract

With the increasing use of RDF graphs, storing and querying such data using SPARQL remains a critical problem. Current mainstream solutions rely on cloud-based data management architectures, but often suffer from performance bottlenecks in environments with limited bandwidth or high system load. To address this issue, this paper explores for the first time the integration of edge computing to move graph data storage and processing to edge environments, thereby improving query performance. This approach requires offloading query processing to edge servers, which involves addressing two challenges: data localization and network scheduling. First, the data localization challenge lies in computing the subgraphs maintained on edge servers to quickly identify the servers that can handle specific queries. To address this challenge, we introduce a new concept of pattern-induced subgraphs. Second, the network scheduling challenge involves efficiently assigning queries to edge and cloud servers to optimize overall system performance. We tackle this by constructing a overall system model that jointly captures data distribution, query characteristics, network communication, and computational resources. Accordingly, we further propose a joint formulation of query assignment and computational resource allocation, modeling it as a Mixed Integer Nonlinear Programming (MINLP) problem and solve this problem using a modified branch-and-bound algorithm. Experimental results on real datasets under a real cloud platform demonstrate that our proposed method outperforms the state-of-the-art baseline methods in terms of efficiency. The codes are available on GitHub

Problem

Research questions and friction points this paper is trying to address.

SPARQL queries

RDF graphs

cloud-edge collaboration

query performance

data localization

Innovation

Methods, ideas, or system contributions that make the work stand out.

edge computing

SPARQL query

pattern-induced subgraphs