Few-shot Cross-country Generalization of Tabular Machine Learning and Foundation Models for Childhood Anemia Prediction under Distribution Shift

📅 2026-05-26

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study addresses the challenge of generalizing pediatric anemia prediction models across countries under distribution shifts and data scarcity. For the first time, the tabular foundation model TabPFN is introduced to global health and evaluated against Logistic Regression, XGBoost, and LightGBM in few-shot settings using leave-one-country-out cross-validation and reverse transfer protocols. Results demonstrate that TabPFN significantly outperforms conventional methods when sample sizes fall below 200, achieving the lowest Brier score (0.042) and Expected Calibration Error (ECE = 0.203), with cross-country AUCs ranging from 0.59 to 0.76. SHAP analysis further reveals that demographic disparities exert a greater influence on predictive performance than model choice, underscoring the effectiveness and robustness of foundation models in low-resource settings.

📝 Abstract

Childhood anemia affects around 40% of children aged 6-59 months globally and arises from heterogeneous factors, limiting model generalizability. We evaluate a transformer-based tabular foundation model against classical supervised methods under cross-country and data-scarce settings. We used DHS data from 16 countries across Africa, Asia, Latin America, the Caucasus, and the Middle East (n=68,856). We compared Logistic Regression, XGBoost, LightGBM, and TabPFN v2.6. Performance was assessed using AUC-ROC, Brier score, and ECE. Generalization was evaluated using leave-one-country-out (LOCO), reverse-LOCO, and few-shot settings. Subgroup analyses included sex, age, residence, maternal education, and wealth. Feature importance was estimated using SHAP. TabPFN outperformed classical models in low-data regimes (<200 samples), showing higher discrimination and better calibration. Across countries, it achieved the lowest Brier score (0.042) and ECE (0.203). Under full-data settings, AUC-ROC ranged from 0.59-0.76 with small between-model differences ($\leq 0.05$). LOCO performance was stable (0.58-0.69), driven by country context. Reverse-LOCO showed asymmetric transferability. Subgroup performance was consistent with no systematic demographic bias. SHAP identified child age, altitude, and height-for-age z-score as dominant predictors, followed by wealth and maternal education. Performance in childhood anemia prediction is driven more by population variation than model choice. TabPFN provides advantages in low-resource settings through improved discrimination and calibration, highlighting foundation models as promising tools for data-scarce global health prediction.

Problem

Research questions and friction points this paper is trying to address.

few-shot learning

cross-country generalization

distribution shift

childhood anemia

tabular data

Innovation

Methods, ideas, or system contributions that make the work stand out.

foundation models

few-shot learning

cross-country generalization