🤖 AI Summary
To support large-scale vessel trajectory analysis and regional monitoring, this study presents a distributed spatial data warehouse system. Addressing the challenges of efficient cleaning, storage, and querying of AIS data, the work proposes a grid-cell-based spatial partitioning scheme, a modular ETL pipeline, and a rasterized trajectory modeling approach, integrated with a heatmap visualization mechanism. The system is horizontally scalable and has been empirically validated on over 8 billion records—equivalent to approximately 312 million kilometers of trajectories—demonstrating that grid-cell queries significantly outperform raw trajectory queries. Under a fivefold increase in computational resources, analytical performance improves by 354% to 1164%, confirming the high efficiency and scalability of the proposed methodology.
📝 Abstract
AIS data from ships is excellent for analyzing single-ship movements and monitoring all ships within a specific area. However, the AIS data needs to be cleaned, processed, and stored before being usable. This paper presents a system consisting of an efficient and modular ETL process for loading AIS data, as well as a distributed spatial data warehouse storing the trajectories of ships. To efficiently analyze a large set of ships, a raster approach to querying the AIS data is proposed. A spatially partitioned data warehouse with a granularized cell representation and heatmap presentation is designed, developed, and evaluated. Currently the data warehouse stores 312 million kilometers of ship trajectories and more than 8 billion rows in the largest table. It is found that searching the cell representation is faster than searching the trajectory representation. Further, we show that the spatially divided shards enable a consistently good scale-up for both cell and heatmap analytics in large areas, ranging between 354% to 1164% with a 5x increase in workers