🤖 AI Summary
To address poor generalizability and suboptimal prognostic performance of machine learning models trained on small-sample clinical datasets, this paper proposes the first synthetic data augmentation framework explicitly designed for clinical interpretability. The framework integrates generative models—including GANs, VAEs, and SMOTE variants—and employs a dual-criterion selection strategy based on generated-data diversity and AUC gain. It further introduces an interpretable decision-support model to assess augmentation applicability, uncovering for the first time systematic associations between augmentation efficacy and intrinsic data characteristics (e.g., baseline AUC, class cardinality, outcome balance). Evaluated on seven real-world small-sample medical datasets, the framework achieves an average AUC improvement of 15.55% (up to +43.23%), significantly outperforming conventional resampling methods (p = 0.016), while yielding synthetically augmented data with significantly higher diversity (p = 0.046).
📝 Abstract
Small datasets are common in health research. However, the generalization performance of machine learning models is suboptimal when the training datasets are small. To address this, data augmentation is one solution. Augmentation increases sample size and is seen as a form of regularization that increases the diversity of small datasets, leading them to perform better on unseen data. We found that augmentation improves prognostic performance for datasets that: have fewer observations, with smaller baseline AUC, have higher cardinality categorical variables, and have more balanced outcome variables. No specific generative model consistently outperformed the others. We developed a decision support model that can be used to inform analysts if augmentation would be useful. For seven small application datasets, augmenting the existing data results in an increase in AUC between 4.31% (AUC from 0.71 to 0.75) and 43.23% (AUC from 0.51 to 0.73), with an average 15.55% relative improvement, demonstrating the nontrivial impact of augmentation on small datasets (p=0.0078). Augmentation AUC was higher than resampling only AUC (p=0.016). The diversity of augmented datasets was higher than the diversity of resampled datasets (p=0.046).