🤖 AI Summary
To address poor scalability, high query latency, and tight architectural coupling in real-time spatial querying over massive mobile object streams, this paper proposes a streaming spatial query processing system built on Apache Flink Stateful Functions. Our approach introduces three key innovations: (1) a lightweight global grid index with efficient metadata synchronization, enabling low-overhead dynamic updates; (2) an actor-like stateful function model integrated with adaptive load balancing, achieving component decoupling and elastic horizontal scaling; and (3) a unified streaming spatial query processing paradigm supporting diverse query types—including range, k-nearest neighbor, and continuous trajectory queries—under a single framework. Experimental evaluation on both real-world and synthetic datasets demonstrates that the system sustains over one million object updates per second, delivers sub-100 ms query latency, and scales linearly to clusters of hundreds of nodes.
📝 Abstract
Spatial data analytics systems are widely studied in both the academia and industry. However, existing systems are limited when handling a large number of moving objects and real time spatial queries. In this work, we architect a scalable and efficient system CheetahGIS to process streaming spatial queries over massive moving objects. In particular, CheetahGIS is built upon Apache Flink Stateful Functions (StateFun), an API for building distributed streaming applications with an actor-like model. CheetahGIS enjoys excellent scalability due to its modular architecture, which clearly decomposes different components and allows scaling individual components. To improve the efficiency and scalability of CheetahGIS, we devise a suite of optimizations, e.g., lightweight global grid-based index, metadata synchroniza tion strategies, and load balance mechanisms. We also formulate a generic paradigm for spatial query processing in CheetahGIS, and verify its generality by processing three representative streaming queries (i.e., object query, range count query, and k nearest neighbor query). We conduct extensive experiments on both real and synthetic datasets to evaluate CheetahGIS.