🤖 AI Summary
Existing image generation evaluation metrics—such as the Fréchet Inception Distance (FID)—exhibit limited sensitivity to fine-grained visual discrepancies, higher-order distributional moments, and tail behavior, and lack statistical rigor. To address these limitations, we propose the Embedding Characteristic Score (ECS), the first metric to formally establish theoretical connections between distributional assessment and higher-order moments as well as tail characteristics, thereby overcoming FID’s insensitivity to non-Gaussian tail mismatches. ECS is grounded in kernel embedding theory and characteristic function analysis, and is validated through Monte Carlo simulations and empirical image experiments across both synthetic and real-world benchmarks. Results demonstrate that ECS significantly outperforms FID and other state-of-the-art metrics in detecting distributional shifts, mode collapse, and tail misalignment. It provides a statistically principled, interpretable, and robust evaluation framework for generative models.
📝 Abstract
Generative models are ubiquitous in modern artificial intelligence (AI) applications. Recent advances have led to a variety of generative modeling approaches that are capable of synthesizing highly realistic samples. Despite these developments, evaluating the distributional match between the synthetic samples and the target distribution in a statistically principled way remains a core challenge. We focus on evaluating image generative models, where studies often treat human evaluation as the gold standard. Commonly adopted metrics, such as the Fr'echet Inception Distance (FID), do not sufficiently capture the differences between the learned and target distributions, because the assumption of normality ignores differences in the tails. We propose the Embedded Characteristic Score (ECS), a comprehensive metric for evaluating the distributional match between the learned and target sample distributions, and explore its connection with moments and tail behavior. We derive natural properties of ECS and show its practical use via simulations and an empirical study.