🤖 AI Summary
This study investigates whether effect size measures (e.g., Cohen’s *d*) can serve as prospective proxies for data sufficiency—specifically, to predict model performance (classification accuracy) and training convergence speed. We conduct systematic supervised learning experiments across varying sample sizes, learning rates, and convergence dynamics, quantitatively assessing statistical associations between effect size and model behavior. Our first empirical evaluation reveals no robust correlation between effect size and either accuracy or convergence rate, demonstrating its unreliability for sample-size planning or performance forecasting. These findings expose fundamental limitations of conventional descriptive statistics in assessing data sufficiency, challenge the implicit assumption that effect size serves as a valid proxy for data quality, and underscore the need for a new evaluation framework integrating statistical learning theory with explicit modeling of data-generating mechanisms.
📝 Abstract
Having a sufficient quantity of quality data is a critical enabler of training effective machine learning models. Being able to effectively determine the adequacy of a dataset prior to training and evaluating a model's performance would be an essential tool for anyone engaged in experimental design or data collection. However, despite the need for it, the ability to prospectively assess data sufficiency remains an elusive capability. We report here on two experiments undertaken in an attempt to better ascertain whether or not basic descriptive statistical measures can be indicative of how effective a dataset will be at training a resulting model. Leveraging the effect size of our features, this work first explores whether or not a correlation exists between effect size, and resulting model performance (theorizing that the magnitude of the distinction between classes could correlate to a classifier's resulting success). We then explore whether or not the magnitude of the effect size will impact the rate of convergence of our learning rate, (theorizing again that a greater effect size may indicate that the model will converge more rapidly, and with a smaller sample size needed). Our results appear to indicate that this is not an effective heuristic for determining adequate sample size or projecting model performance, and therefore that additional work is still needed to better prospectively assess adequacy of data.