🤖 AI Summary
This study addresses the challenge of unsupervised discovery of high-frequency, semantically coherent personal environmental units from ecological momentary assessment (EMA) images and modeling their associations with health outcomes, within the context of individualized digital phenotyping. We propose an infinite hierarchical contrastive clustering framework: (1) a Bayesian nonparametric clustering method leveraging a stick-breaking prior to adaptively infer the number of clusters; (2) participant-specific predictive loss to enhance semantic separability of subclusters; and (3) integration of contrastive learning with computer vision–driven image representation. Evaluated on real-world EMA image data, our approach automatically constructs fine-grained, hierarchical typologies of personal environments. It significantly improves the interpretability of associations between environmental features and clinical outcomes—including depression and fatigue—thereby establishing a novel paradigm for investigating environment–health mechanisms.
📝 Abstract
Daily environments have profound influence on our health and behavior. Recent work has shown that digital envirotyping, where computer vision is applied to images of daily environments taken during ecological momentary assessment (EMA), can be used to identify meaningful relationships between environmental features and health outcomes of interest. To systematically study such effects on an individual level, it is helpful to group images into distinct environments encountered in an individual's daily life; these may then be analyzed, further grouped into related environments with similar features, and linked to health outcomes. Here we introduce infinite hierarchical contrastive clustering to address this challenge. Building on the established contrastive clustering framework, our method a) allows an arbitrary number of clusters without requiring the full Dirichlet Process machinery by placing a stick-breaking prior on predicted cluster probabilities; and b) encourages distinct environments to form well-defined sub-clusters within each cluster of related environments by incorporating a participant-specific prediction loss. Our experiments show that our model effectively identifies distinct personal environments and groups these environments into meaningful environment types. We then illustrate how the resulting clusters can be linked to various health outcomes, highlighting the potential of our approach to advance the envirotyping paradigm.