🤖 AI Summary
Deep generative models frequently exhibit likelihood inversion for out-of-distribution (OOD) image detection, undermining reliability. This work demonstrates that the likelihood itself is not inherently flawed; rather, the failure stems from modeling directly in the raw pixel space. Our key innovation is to shift likelihood estimation into the representation space of pretrained vision encoders (e.g., ViT or ResNet) and perform efficient likelihood computation via the probability flow ODE of diffusion models. To our knowledge, this is the first approach to theoretically and empirically establish that likelihood—when computed in a suitable semantic representation space—recovers its validity as an effective OOD detection score. Evaluated across multiple standard benchmarks, our method achieves state-of-the-art performance, significantly outperforming pixel-space likelihood baselines and matching the accuracy of current top-performing OOD detectors.
📝 Abstract
Out-of-distribution (OOD) detection is critical for ensuring the reliability of deep learning systems, particularly in safety-critical applications. Likelihood-based deep generative models have historically faced criticism for their unsatisfactory performance in OOD detection, often assigning higher likelihood to OOD data than in-distribution samples when applied to image data. In this work, we demonstrate that likelihood is not inherently flawed. Rather, several properties in the images space prohibit likelihood as a valid detection score. Given a sufficiently good likelihood estimator, specifically using the probability flow formulation of a diffusion model, we show that likelihood-based methods can still perform on par with state-of-the-art methods when applied in the representation space of pre-trained encoders. The code of our work can be found at $href{https://github.com/limchaos/Likelihood-OOD.git}{ exttt{https://github.com/limchaos/Likelihood-OOD.git}}$.