From Rows to Yields: How Foundation Models for Tabular Data Simplify Crop Yield Prediction

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses key bottlenecks—complex feature engineering and prolonged model deployment—in subnational summer crop yield forecasting in South Africa. We propose a lightweight prediction framework leveraging the tabular foundation model TabPFN. Our method integrates multi-source remote sensing and gridded meteorological data—including monthly FAPAR, soil moisture, air temperature, precipitation, and solar radiation—feeding them directly into TabPFN for end-to-end regression, thereby eliminating labor-intensive feature engineering and hyperparameter tuning required by conventional machine learning approaches. Experimental results demonstrate that TabPFN achieves predictive accuracy comparable to state-of-the-art tree-based models (e.g., XGBoost, Random Forest) and significantly outperforms simple baselines (e.g., linear regression), while accelerating training by an order of magnitude and substantially lowering deployment barriers. To our knowledge, this is the first systematic application of TabPFN to agricultural remote sensing–based yield prediction, establishing a novel, rapid, and scalable paradigm for agricultural monitoring in resource-constrained regions.

Technology Category

Application Category

📝 Abstract
We present an application of a foundation model for small- to medium-sized tabular data (TabPFN), to sub-national yield forecasting task in South Africa. TabPFN has recently demonstrated superior performance compared to traditional machine learning (ML) models in various regression and classification tasks. We used the dekadal (10-days) time series of Earth Observation (EO; FAPAR and soil moisture) and gridded weather data (air temperature, precipitation and radiation) to forecast the yield of summer crops at the sub-national level. The crop yield data was available for 23 years and for up to 8 provinces. Covariate variables for TabPFN (i.e., EO and weather) were extracted by region and aggregated at a monthly scale. We benchmarked the results of the TabPFN against six ML models and three baseline models. Leave-one-year-out cross-validation experiment setting was used in order to ensure the assessment of the models capacity to forecast an unseen year. Results showed that TabPFN and ML models exhibit comparable accuracy, outperforming the baselines. Nonetheless, TabPFN demonstrated superior practical utility due to its significantly faster tuning time and reduced requirement for feature engineering. This renders TabPFN a more viable option for real-world operation yield forecasting applications, where efficiency and ease of implementation are paramount.
Problem

Research questions and friction points this paper is trying to address.

Applying TabPFN to predict crop yields in South Africa
Comparing TabPFN with traditional ML models for accuracy
Evaluating TabPFN's efficiency in real-world yield forecasting
Innovation

Methods, ideas, or system contributions that make the work stand out.

TabPFN for tabular data yield prediction
Uses Earth Observation and weather data
Faster tuning and less feature engineering
🔎 Similar Papers
No similar papers found.
Filip Sabo
Filip Sabo
European Commission - Joint Research Centre
remote sensing
Michele Meroni
Michele Meroni
Joint Research Centre of European Commission, IES, MARS
Remote SensingVegetationAgricultureTime series anaysisHyperspectral
M
Maria Piles
Universitat de València, Image Processing Laboratory
Martin Claverie
Martin Claverie
Joint Research Center, European Commission
remote sensingagriculture
F
Fanie Ferreira
GeoTerraImage (Pty) Ltd, Pretoria, South Africa
E
Elna Van Den Berg
GeoTerraImage (Pty) Ltd, Pretoria, South Africa
F
Francesco Collivignarelli
European Commission, Joint Research Centre (JRC), Ispra, Italy
Felix Rembold
Felix Rembold
Team Leader Food Security, Joint Research Centre of the European Commission, Ispra
remote sensingagricultureland usedrought