🤖 AI Summary
Exposure science faces significant challenges due to the high dimensionality, heterogeneity, multi-source nature, and cross-spatiotemporal scale complexity of exposome data, compounded by the absence of a unified, AI-ready measurement framework. To address this, we propose the first exposome data engineering framework explicitly designed for AI modeling. It systematically integrates over 30 external exposure datasets through spatial indexing, multimodal fusion, and rigorous standardization—enabling cross-scale data alignment and substantial quality enhancement. The resulting high-quality, scalable exposome database supports diverse applications, including regional environmental characterization, air pollution modeling, and cancer risk biomarker development. Critically, this work establishes the first end-to-end standardized pipeline—from exposome data acquisition to AI model input—thereby providing a reusable, foundational infrastructure for environment–genetics–health association studies.
📝 Abstract
The Centralized Health and Exposomic Resource (C-HER) project has identified, profiled, spatially indexed, and stored over 30 external exposomic datasets. The resulting analytic and AI-ready data (AAIRD) provides a significant opportunity to develop an integrated picture of the exposome for health research. The exposome is a conceptual framework designed to guide the study of the complex environmental and genetic factors that together shape human health. Few composite measures of the exposome exist due to the high dimensionality of exposure data, multimodal data sources, and varying spatiotemporal scales. We develop a data engineering solution that overcomes the challenges of spatio-temporal linkage in this field. We provide examples of how environmental data can be combined to characterize a region, model air pollution, or provide indicators for cancer research. The development of AAIRD will allow future studies to use ML and deep learning methods to generate spatial and contextual exposure data for disease prediction.