🤖 AI Summary
This study addresses the challenge of mixed-resolution data commonly encountered in healthcare and public institutions, where low-resolution samples arise due to storage constraints, privacy considerations, or limited resources, and their utility for high-resolution tasks remains unclear. The work proposes, for the first time, an information-theoretic framework based on Kullback–Leibler divergence to quantify the information loss induced by downsampling and derives theoretical bounds on the relative contributions of high- and low-resolution samples to model training. Through extensive experiments integrating multi-resolution data within both Vision Transformers and convolutional neural networks, the study demonstrates that incorporating low-resolution data consistently and significantly enhances model performance when high-resolution data are scarce.
📝 Abstract
Artificial intelligence systems typically rely on large, centrally collected datasets, a premise that does not hold in many real-world domains such as healthcare and public institutions. In these settings, data sharing is often constrained by storage, privacy, or resource limitations. For example, small wearable devices may lack the bandwidth or energy capacity needed to store and transmit high-resolution data, leading to aggregation during data collection and thus a loss of information. As a result, datasets collected from different sources may consist of a mixture of high- and low-resolution samples. Despite the prevalence of this setting, it remains unclear how informative low-resolution data is when models are ultimately evaluated on high-resolution inputs. We provide a theoretical analysis based on the Kullback-Leibler divergence that characterises how the influence of a datapoint changes with resolution, and derive bounds that relate the relative contribution of high- and low-resolution observations to the information lost under downsampling. To support this analysis, we empirically demonstrate, using both a vision transformer and a convolutional neural network, that adding low-resolution data to the training set consistently improves performance when high-resolution data is scarce.