🤖 AI Summary
Sparse occurrence records and limited georeferenced data hinder accurate spatial distribution modeling of *Commelina benghalensis* L. in sugarcane fields on Réunion Island.
Method: We propose a spatial interpolation–based data augmentation framework, systematically comparing Gaussian process regression (GPR) — using RBF, Matérn, and a novel composite kernel (GP-COMB) — against ordinary kriging with multiple variogram models. Spatial cross-validation is employed to rigorously assess generalization performance.
Contribution/Results: This study provides the first quantitative comparison of GPR and kriging for agricultural weed mapping, explicitly characterizing the trade-off between prediction accuracy and spatial consistency. GP-COMB achieves substantial performance gains with minimal additional sampling, while kriging yields more spatially uniform synthetic samples despite marginally lower average accuracy. The proposed framework establishes a reproducible, spatially aware data augmentation paradigm for small-sample geographic and ecological modeling.
📝 Abstract
Data augmentation is a crucial step in the development of robust supervised learning models, especially when dealing with limited datasets. This study explores interpolation techniques for the augmentation of geo-referenced data, with the aim of predicting the presence of Commelina benghalensis L. in sugarcane plots in La R{'e}union. Given the spatial nature of the data and the high cost of data collection, we evaluated two interpolation approaches: Gaussian processes (GPs) with different kernels and kriging with various variograms. The objectives of this work are threefold: (i) to identify which interpolation methods offer the best predictive performance for various regression algorithms, (ii) to analyze the evolution of performance as a function of the number of observations added, and (iii) to assess the spatial consistency of augmented datasets. The results show that GP-based methods, in particular with combined kernels (GP-COMB), significantly improve the performance of regression algorithms while requiring less additional data. Although kriging shows slightly lower performance, it is distinguished by a more homogeneous spatial coverage, a potential advantage in certain contexts.