🤖 AI Summary
Conventional socioeconomic indicators (e.g., parental status) may inadequately capture heterogeneous social influences on educational attainment. Method: This study introduces “prediction gap”—the performance disparity among models of varying complexity (logistic regression, gradient boosting machines, graph neural networks) in predicting university completion—as a novel paradigm to detect contextual social effects. Leveraging nationwide Dutch administrative data, we employ multilevel modeling to comparatively assess the predictive contributions of early-life environmental factors across family, school, and neighborhood levels. Contribution/Results: While aggregate prediction gaps are small—suggesting mainstream variables explain most educational inequality—substantial gaps emerge in specific subgroups (e.g., girls from single-parent households), revealing latent structural inequities overlooked by traditional metrics. This approach advances sociological explanations of educational inequality by enabling scalable, data-driven identification of marginalized populations and their underlying contextual determinants.
📝 Abstract
Social contexts -- such as families, schools, and neighborhoods -- shape life outcomes. The key question is not simply whether they matter, but rather for whom and under what conditions. Here, we argue that prediction gaps -- differences in predictive performance between statistical models of varying complexity -- offer a pathway for identifying surprising empirical patterns (i.e., not captured by simpler models) which highlight where theories succeed or fall short. Using population-scale administrative data from the Netherlands, we compare logistic regression, gradient boosting, and graph neural networks to predict university completion using early-life social contexts. Overall, prediction gaps are small, suggesting that previously identified indicators, particularly parental status, capture most measurable variation in educational attainment. However, gaps are larger for girls growing up without fathers -- suggesting that the effects of social context for these groups go beyond simple models in line with sociological theory. Our paper shows the potential of prediction methods to support sociological explanation.