🤖 AI Summary
In online evaluation of generative models, there is a practical need to rapidly identify the optimal model—i.e., the one achieving the highest Fréchet Inception Distance (FID) or Inception Score (IS)—using minimal query samples.
Method: This paper introduces, for the first time, optimism-based multi-armed bandits (MAB) to this setting, proposing FID-UCB and IS-UCB algorithms. Leveraging upper confidence bounds (UCB), these methods dynamically allocate evaluation budgets across models and are theoretically guaranteed to achieve sublinear regret, ensuring efficient convergence.
Contribution/Results: We establish the first theoretical framework for online generative model evaluation and present the first practical algorithms directly optimizing FID/IS with rigorous regret guarantees. Experiments on standard image datasets demonstrate that our approaches reduce sample consumption by up to several-fold compared to offline evaluation baselines, while maintaining high accuracy in identifying the optimal model.
📝 Abstract
Existing frameworks for evaluating and comparing generative models typically target an offline setting, where the evaluator has access to full batches of data produced by the models. However, in many practical scenarios, the goal is to identify the best model using the fewest generated samples to minimize the costs of querying data from the models. Such an online comparison is challenging with current offline assessment methods. In this work, we propose an online evaluation framework to find the generative model that maximizes a standard assessment score among a group of available models. Our method uses an optimism-based multi-armed bandit framework to identify the model producing data with the highest evaluation score, quantifying the quality and diversity of generated data. Specifically, we study the online assessment of generative models based on the Fr'echet Inception Distance (FID) and Inception Score (IS) metrics and propose the FID-UCB and IS-UCB algorithms leveraging the upper confidence bound approach in online learning. We prove sub-linear regret bounds for these algorithms and present numerical results on standard image datasets, demonstrating their effectiveness in identifying the score-maximizing generative model.