How Your Location Relates to Health: Variable Importance and Interpretable Machine Learning for Environmental and Sociodemographic Data

📅 2025-01-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the spatiotemporal heterogeneity of residential environmental and sociodemographic determinants on multiple health outcomes—asthma, hypertension, and anxiety—particularly during the COVID-19 pandemic. Leveraging fine-grained spatiotemporal data from the UK MEDSAT cohort, we develop a novel integrative framework combining interpretable machine learning with spatial statistics: specifically, robust variable importance assessment (via permutation tests and SHAP), generalized additive models (GAMs), and multiscale geographically weighted regression (MGWR). This enables simultaneous identification of global associations and modeling of local spatial heterogeneity. Results reveal NO₂ as a significant global risk factor across all three conditions; occupational status, marital status, and vegetation cover exhibit outcome-specific effects; and the health impacts of air pollution and solar radiation display pronounced regional variation—systematically shifting during the pandemic. The framework establishes a new, interpretable, and generalizable paradigm for spatiotemporal causal inference in environmental epidemiology.

Technology Category

Application Category

📝 Abstract
Health outcomes depend on complex environmental and sociodemographic factors whose effects change over location and time. Only recently has fine-grained spatial and temporal data become available to study these effects, namely the MEDSAT dataset of English health, environmental, and sociodemographic information. Leveraging this new resource, we use a variety of variable importance techniques to robustly identify the most informative predictors across multiple health outcomes. We then develop an interpretable machine learning framework based on Generalized Additive Models (GAMs) and Multiscale Geographically Weighted Regression (MGWR) to analyze both local and global spatial dependencies of each variable on various health outcomes. Our findings identify NO2 as a global predictor for asthma, hypertension, and anxiety, alongside other outcome-specific predictors related to occupation, marriage, and vegetation. Regional analyses reveal local variations with air pollution and solar radiation, with notable shifts during COVID. This comprehensive approach provides actionable insights for addressing health disparities, and advocates for the integration of interpretable machine learning in public health.
Problem

Research questions and friction points this paper is trying to address.

Residential Environment
Population Characteristics
Health Impact
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interpretable Machine Learning
Environmental and Demographic Factors
Health Impact Analysis
🔎 Similar Papers
No similar papers found.
I
Ishaan Maitra
Duke University, North Carolina, United States
R
Raymond Lin
Duke University, North Carolina, United States
E
Eric Chen
Duke University, North Carolina, United States
Jon Donnelly
Jon Donnelly
PhD Student at Duke University
Interpretable Machine Learning
S
Sanja vS'cepanovi'c
Nokia Bell Labs, Cambridge, United Kingdom
Cynthia Rudin
Cynthia Rudin
Professor of Computer Science, ECE, Statistics, and Biostatistics & Bioinformatics, Duke University
machine learninginterpretabilitydata science