Can Moran Eigenvectors Improve Machine Learning of Spatial Data? Insights from Synthetic Data Validation

πŸ“… 2025-04-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study investigates whether Moran eigenvector filters (MEFs) β€” as spatial covariates β€” enhance the spatial modeling capability of mainstream machine learning models (RF, LightGBM, XGBoost, TabNet). Method: We generate synthetic spatial data with realistic spatial heterogeneity and nonlinearity over two geometric structures (areal and network domains), systematically evaluate MEFs within the Moran spectral filtering framework, and conduct interpretability analysis via GeoShapley. Contribution/Results: Under positive areal spatial autocorrelation, models using only coordinate features achieve significantly higher RΒ² than those incorporating MEFs; however, MEFs yield robust performance gains under negative or network-based spatial autocorrelation. This work provides the first unified empirical validation of MEFs’ representational boundaries across diverse tree-based and deep learning models, clarifying their applicability conditions and limitations. It establishes an evidence-based benchmark and methodological guidance for spatial feature engineering in geographical machine learning.

Technology Category

Application Category

πŸ“ Abstract
Moran Eigenvector Spatial Filtering (ESF) approaches have shown promise in accounting for spatial effects in statistical models. Can this extend to machine learning? This paper examines the effectiveness of using Moran Eigenvectors as additional spatial features in machine learning models. We generate synthetic datasets with known processes involving spatially varying and nonlinear effects across two different geometries. Moran Eigenvectors calculated from different spatial weights matrices, with and without a priori eigenvector selection, are tested. We assess the performance of popular machine learning models, including Random Forests, LightGBM, XGBoost, and TabNet, and benchmark their accuracies in terms of cross-validated R2 values against models that use only coordinates as features. We also extract coefficients and functions from the models using GeoShapley and compare them with the true processes. Results show that machine learning models using only location coordinates achieve better accuracies than eigenvector-based approaches across various experiments and datasets. Furthermore, we discuss that while these findings are relevant for spatial processes that exhibit positive spatial autocorrelation, they do not necessarily apply when modeling network autocorrelation and cases with negative spatial autocorrelation, where Moran Eigenvectors would still be useful.
Problem

Research questions and friction points this paper is trying to address.

Evaluating Moran Eigenvectors' effectiveness in spatial machine learning
Comparing eigenvector-based models with coordinate-only models in accuracy
Assessing applicability for positive vs negative spatial autocorrelation cases
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using Moran Eigenvectors as spatial features
Testing various spatial weights matrices
Benchmarking against coordinate-only models
πŸ”Ž Similar Papers
No similar papers found.
Ziqi Li
Ziqi Li
Assistant Professor, Florida State University
Spatial Data ScienceGIScienceSpatial Statistics
Z
Zhan Peng
Transportation and Geographic Information Science Lab, Graduate School of Information Sciences, Tohoku University, Japan