🤖 AI Summary
This work addresses a critical limitation in current evaluation methods for virtual staining models, which focus solely on marginal distributions at the dataset level and fail to assess the quality of posterior predictions at the single-cell level. To overcome this, the authors propose Information Gain (IG) as a cell-level evaluation framework. IG is a strictly proper scoring rule with strong theoretical interpretability, enabling fair comparisons across models and features. The approach is validated through experiments combining diffusion models and GANs on high-throughput screening datasets. Results demonstrate that IG effectively uncovers performance differences invisible to conventional metrics, accurately distinguishing generative models based on their ability to capture single-cell posterior distributions.
📝 Abstract
Generative virtual staining (VS) models for high-throughput screening (HTS) can provide an estimated posterior distribution of possible biological feature values for each input and cell. However, when evaluating a VS model, the true posterior is unavailable. Existing evaluation protocols only check the accuracy of the marginal distribution over the dataset rather than the predicted posteriors. We introduce information gain (IG) as a cell-wise evaluation framework that enables direct assessment of predicted posteriors. IG is a strictly proper scoring rule and comes with a sound theoretical motivation allowing for interpretability, and for comparing results across models and features. We evaluate diffusion- and GAN-based models on an extensive HTS dataset using IG and other metrics and show that IG can reveal substantial performance differences other metrics cannot.