🤖 AI Summary
This study addresses key bottlenecks—complex feature engineering and prolonged model deployment—in subnational summer crop yield forecasting in South Africa. We propose a lightweight prediction framework leveraging the tabular foundation model TabPFN. Our method integrates multi-source remote sensing and gridded meteorological data—including monthly FAPAR, soil moisture, air temperature, precipitation, and solar radiation—feeding them directly into TabPFN for end-to-end regression, thereby eliminating labor-intensive feature engineering and hyperparameter tuning required by conventional machine learning approaches. Experimental results demonstrate that TabPFN achieves predictive accuracy comparable to state-of-the-art tree-based models (e.g., XGBoost, Random Forest) and significantly outperforms simple baselines (e.g., linear regression), while accelerating training by an order of magnitude and substantially lowering deployment barriers. To our knowledge, this is the first systematic application of TabPFN to agricultural remote sensing–based yield prediction, establishing a novel, rapid, and scalable paradigm for agricultural monitoring in resource-constrained regions.
📝 Abstract
We present an application of a foundation model for small- to medium-sized tabular data (TabPFN), to sub-national yield forecasting task in South Africa. TabPFN has recently demonstrated superior performance compared to traditional machine learning (ML) models in various regression and classification tasks. We used the dekadal (10-days) time series of Earth Observation (EO; FAPAR and soil moisture) and gridded weather data (air temperature, precipitation and radiation) to forecast the yield of summer crops at the sub-national level. The crop yield data was available for 23 years and for up to 8 provinces. Covariate variables for TabPFN (i.e., EO and weather) were extracted by region and aggregated at a monthly scale. We benchmarked the results of the TabPFN against six ML models and three baseline models. Leave-one-year-out cross-validation experiment setting was used in order to ensure the assessment of the models capacity to forecast an unseen year. Results showed that TabPFN and ML models exhibit comparable accuracy, outperforming the baselines. Nonetheless, TabPFN demonstrated superior practical utility due to its significantly faster tuning time and reduced requirement for feature engineering. This renders TabPFN a more viable option for real-world operation yield forecasting applications, where efficiency and ease of implementation are paramount.