Demographic Distribution Matching between real world and virtual phantom population

📅 2025-07-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Virtual imaging trials (VITs) suffer from limited clinical translatability due to demographic distribution mismatches between virtual and real-world populations. Method: We propose DISTINCT, a novel framework enabling joint multidimensional distribution alignment of both continuous (e.g., age) and categorical (e.g., sex, race) covariates. DISTINCT constructs jointly optimized discrete bins and solves for the maximum alignable sample size to ensure valid comparability. Distribution similarity is rigorously quantified via Wasserstein distance and Kolmogorov–Smirnov tests, while model stability is validated using ROC analysis and stratified AUC. Results: Applied to NLST/VLST data, DISTINCT identified an optimally aligned subsample of 9,974 subjects; AUC stabilized at ≥6,000 samples. The method efficiently generates representative virtual cohorts, significantly enhancing external validity and assessment reliability of VITs.

Technology Category

Application Category

📝 Abstract
Virtual imaging trials (VITs) offer scalable and cost-effective tools for evaluating imaging systems and protocols. However, their translational impact depends on rigorous comparability between virtual and real-world populations. This study introduces DISTINCT (Distributional Subsampling for Covariate-Targeted Alignment), a statistical framework for selecting demographically aligned subsamples from large clinical datasets to support robust comparisons with virtual cohorts. We applied DISTINCT to the National Lung Screening Trial (NLST) and a companion virtual trial dataset (VLST). The algorithm jointly aligned typical continuous (age, BMI) and categorical (sex, race, ethnicity) variables by constructing multidimensional bins based on discretized covariates. For a given target size, DISTINCT samples individuals to match the joint demographic distribution of the reference population. We evaluated the demographic similarity between VLST and progressively larger NLST subsamples using Wasserstein and Kolmogorov-Smirnov (K-S) distances to identify the maximal subsample size with acceptable alignment. The algorithm identified a maximal aligned NLST subsample of 9,974 participants, preserving demographic similarity to the VLST population. Receiver operating characteristic (ROC) analysis using risk scores for lung cancer detection showed that area under the curve (AUC) estimates stabilized beyond 6,000 participants, confirming the sufficiency of aligned subsamples for virtual imaging trial evaluation. Stratified AUC analysis revealed substantial performance variation across demographic subgroups, reinforcing the importance of covariate alignment in comparative studies.
Problem

Research questions and friction points this paper is trying to address.

Ensures demographic alignment between real and virtual populations
Selects subsamples matching joint distribution of key covariates
Validates sufficient subsample size for stable performance evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Statistical framework for demographic alignment
Multidimensional binning for covariate matching
Maximal subsample identification with demographic similarity
🔎 Similar Papers
No similar papers found.