CheetahGIS: Architecting a Scalable and Efficient Streaming Spatial Query Processing System

📅 2025-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor scalability, high query latency, and tight architectural coupling in real-time spatial querying over massive mobile object streams, this paper proposes a streaming spatial query processing system built on Apache Flink Stateful Functions. Our approach introduces three key innovations: (1) a lightweight global grid index with efficient metadata synchronization, enabling low-overhead dynamic updates; (2) an actor-like stateful function model integrated with adaptive load balancing, achieving component decoupling and elastic horizontal scaling; and (3) a unified streaming spatial query processing paradigm supporting diverse query types—including range, k-nearest neighbor, and continuous trajectory queries—under a single framework. Experimental evaluation on both real-world and synthetic datasets demonstrates that the system sustains over one million object updates per second, delivers sub-100 ms query latency, and scales linearly to clusters of hundreds of nodes.

Technology Category

Application Category

📝 Abstract
Spatial data analytics systems are widely studied in both the academia and industry. However, existing systems are limited when handling a large number of moving objects and real time spatial queries. In this work, we architect a scalable and efficient system CheetahGIS to process streaming spatial queries over massive moving objects. In particular, CheetahGIS is built upon Apache Flink Stateful Functions (StateFun), an API for building distributed streaming applications with an actor-like model. CheetahGIS enjoys excellent scalability due to its modular architecture, which clearly decomposes different components and allows scaling individual components. To improve the efficiency and scalability of CheetahGIS, we devise a suite of optimizations, e.g., lightweight global grid-based index, metadata synchroniza tion strategies, and load balance mechanisms. We also formulate a generic paradigm for spatial query processing in CheetahGIS, and verify its generality by processing three representative streaming queries (i.e., object query, range count query, and k nearest neighbor query). We conduct extensive experiments on both real and synthetic datasets to evaluate CheetahGIS.
Problem

Research questions and friction points this paper is trying to address.

Handling large-scale moving objects in real-time spatial queries
Overcoming scalability limitations in streaming spatial data systems
Improving efficiency of distributed spatial query processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Built on Apache Flink Stateful Functions for streaming
Uses modular architecture for scalable component decomposition
Employs grid index and load balancing for efficiency
🔎 Similar Papers
No similar papers found.
J
Jiaping Cao
Department of Computing, Hong Kong Polytechnic University
T
Ting Sun
Department of Computer Science, Southern University of Science and Technology
Man Lung Yiu
Man Lung Yiu
Professor, Hong Kong Polytechnic University
Database
X
Xiao Yan
Institute for Math & AI, Wuhan University
B
Bo Tang
Department of Computer Science, Southern University of Science and Technology