GeoReg: Weight-Constrained Few-Shot Regression for Socio-Economic Estimation using LLM

📅 2025-07-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of accurately estimating socioeconomic indicators—such as GDP, population, and educational attainment—in data-scarce regions, particularly low- and middle-income countries. We propose an LLM-guided few-shot regression framework that jointly extracts semantic and spatial features from satellite imagery and web-based geospatial data using large language models. A feature categorization mechanism and weight-constrained nonlinear interaction modeling are introduced to capture complex cross-regional transformations and enhance generalization under extreme data scarcity. Empirical evaluation across three countries with varying development levels demonstrates consistent superiority over state-of-the-art baselines; notably, prediction error decreases by 23.6% on average in low-income settings with minimal training samples. Our core contribution is a novel, transferable, interpretable, and annotation-efficient remote sensing paradigm for socioeconomic estimation—reducing reliance on extensive labeled ground-truth data while preserving model transparency and cross-context adaptability.

Technology Category

Application Category

📝 Abstract
Socio-economic indicators like regional GDP, population, and education levels, are crucial to shaping policy decisions and fostering sustainable development. This research introduces GeoReg a regression model that integrates diverse data sources, including satellite imagery and web-based geospatial information, to estimate these indicators even for data-scarce regions such as developing countries. Our approach leverages the prior knowledge of large language model (LLM) to address the scarcity of labeled data, with the LLM functioning as a data engineer by extracting informative features to enable effective estimation in few-shot settings. Specifically, our model obtains contextual relationships between data features and the target indicator, categorizing their correlations as positive, negative, mixed, or irrelevant. These features are then fed into the linear estimator with tailored weight constraints for each category. To capture nonlinear patterns, the model also identifies meaningful feature interactions and integrates them, along with nonlinear transformations. Experiments across three countries at different stages of development demonstrate that our model outperforms baselines in estimating socio-economic indicators, even for low-income countries with limited data availability.
Problem

Research questions and friction points this paper is trying to address.

Estimates socio-economic indicators in data-scarce regions
Leverages LLM prior knowledge for few-shot regression
Integrates diverse data sources like satellite imagery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates satellite and geospatial data
Uses LLM for feature extraction
Applies tailored weight constraints
🔎 Similar Papers
No similar papers found.