🤖 AI Summary
To address the low computational efficiency and poor interpretability in modeling geospatial tabular data (GTD), this paper proposes an enhanced GeoAggregator model. Methodologically, it (i) designs an efficient data-loading pipeline and a lightweight forward-propagation architecture to accelerate inference; (ii) introduces the GeoShapley framework—the first application of Shapley-value-based post-hoc attribution to geospatial models—enabling interpretation grounded in spatial dependencies; and (iii) incorporates a Transformer-based ensemble mechanism to improve predictive robustness. Evaluated on a synthetic GTD benchmark, the model achieves a 12.3% improvement in prediction accuracy and a 2.1× speedup in inference time over baseline methods, while empirically demonstrating accurate capture of spatial heterogeneity and proximity effects. All code is publicly available.
📝 Abstract
Accurate modeling and explaining geospatial tabular data (GTD) are critical for understanding geospatial phenomena and their underlying processes. Recent work has proposed a novel transformer-based deep learning model named GeoAggregator (GA) for this purpose, and has demonstrated that it outperforms other statistical and machine learning approaches. In this short paper, we further improve GA by 1) developing an optimized pipeline that accelerates the dataloading process and streamlines the forward pass of GA to achieve better computational efficiency; and 2) incorporating a model ensembling strategy and a post-hoc model explanation function based on the GeoShapley framework to enhance model explainability. We validate the functionality and efficiency of the proposed strategies by applying the improved GA model to synthetic datasets. Experimental results show that our implementation improves the prediction accuracy and inference speed of GA compared to the original implementation. Moreover, explanation experiments indicate that GA can effectively captures the inherent spatial effects in the designed synthetic dataset. The complete pipeline has been made publicly available for community use (https://github.com/ruid7181/GA-sklearn).