🤖 AI Summary
This paper addresses optimal experimental design for unlabeled image datasets. We propose a regret-minimization framework integrated with entropy regularization to provably select representative small-scale subsets efficiently. Our key contribution is the first incorporation of entropy regularization into regret-minimization-based optimal design, yielding theoretical guarantees that the selected subset achieves a (1+ε)-approximation to the optimal design—naturally extending to regularized design settings. The method relies solely on convex optimization and operates in a fully unsupervised manner, requiring no labels. Experiments on MNIST, CIFAR-10, and ImageNet-50 demonstrate that samples selected by our approach significantly improve downstream classifier performance (e.g., logistic regression), outperforming state-of-the-art sampling baselines. These results validate both the sample efficiency and generalization effectiveness of our method.
📝 Abstract
We explore extensions and applications of the regret minimization framework introduced by~cite{design} for solving optimal experimental design problems. Specifically, we incorporate the entropy regularizer into this framework, leading to a novel sample selection objective and a provable sample complexity bound that guarantees a $(1+epsilon)$-near optimal solution. We further extend the method to handle regularized optimal design settings. As an application, we use our algorithm to select a small set of representative samples from image classification datasets without relying on label information. To evaluate the quality of the selected samples, we train a logistic regression model and compare performance against several baseline sampling strategies. Experimental results on MNIST, CIFAR-10, and a 50-class subset of ImageNet show that our approach consistently outperforms competing methods in most cases.