๐ค AI Summary
This study addresses the challenges of identifying at-risk populations and achieving high-accuracy prediction of radon exposure in high-radon regions (e.g., Pennsylvania and Utah). We developed the first state-level, high-resolution radon exposure dataset integrating heterogeneous geospatial (soil type, geochemistry, land surface temperature) and sociodemographic (building age, heating system, housing structure) predictors. Methodologically, we innovatively achieved systematic fusion of multisource, heterogeneous spatial data at sub-kilometer resolution, leveraging geographically weighted regression, multi-scale rasterโvector alignment, and a standardized metadata framework. The resulting open-source dataset supports spatial prediction of indoor radon concentrations and precise identification of high-risk populations at ZCTA and finer geographic units. It has already enabled two follow-up studies and provides a scalable foundational platform for nationwide (CONUS) radon exposure reclassification and targeted public health interventions.
๐ Abstract
Exposure to elevated radon levels in the home is one of the leading causes of lung cancer in the world. The following study describes the creation of a comprehensive, state-level dataset designed to enable the modeling and prediction of household radon concentrations at Zip Code Tabulation Area (ZCTA) and sub-kilometer scales. Details include the data collection and processing involved in compiling physical and demographic factors for Pennsylvania and Utah. Attempting to mitigate this risk requires identifying the underlying geological causes and the populations that might be at risk. This work focuses on identifying at-risk populations throughout Pennsylvania and Utah, where radon levels are some of the highest in the country. The resulting dataset harmonizes geological and demographic factors from various sources and spatial resolutions, including temperature, geochemistry, and soil characteristics. Demographic variables such as the household heating fuel used, the age of building, and the housing type provide further insight into which populations could be most susceptible in areas with potentially high radon levels. This dataset also serves as a foundational resource for two other studies conducted by the authors. The resolution of the data provides a novel approach to predicting potential radon exposure, and the data processing conducted for these states can be scaled up to larger spatial resolutions (e.g., the Contiguous United States [CONUS]) and allow for a broad reclassification of radon exposure potential in the United States.