🤖 AI Summary
This work addresses the critical challenge in food recognition models—namely, their inability to reliably distinguish in-distribution (ID) classes from out-of-distribution (OOD) samples in real-world applications such as automated dietary assessment, often leading to erroneous ID classifications of OOD inputs. We conduct the first empirical study on post-hoc OOD detection specifically for fine-grained food recognition. We systematically evaluate state-of-the-art methods—including ViM (Virtual Logit Matching)—and find ViM achieves superior performance across standard OOD detection metrics (e.g., AUROC and FPR95). We further uncover a strong positive correlation between ID classification accuracy and OOD detection capability, and demonstrate that Transformer-based architectures consistently outperform CNN baselines under all evaluated OOD detection methods. Our findings enhance the robustness and safety of food recognition systems in open-world settings and provide a practical, deployable solution for unknown-category identification.
📝 Abstract
Food recognition models often struggle to distinguish between seen and unseen samples, frequently misclassifying samples from unseen categories by assigning them an in-distribution (ID) label. This misclassification presents significant challenges when deploying these models in real-world applications, particularly within automatic dietary assessment systems, where incorrect labels can lead to cascading errors throughout the system. Ideally, such models should prompt the user when an unknown sample is encountered, allowing for corrective action. Given no prior research exploring food recognition in real-world settings, in this work we conduct an empirical analysis of various post-hoc out-of-distribution (OOD) detection methods for fine-grained food recognition. Our findings indicate that virtual logit matching (ViM) performed the best overall, likely due to its combination of logits and feature-space representations. Additionally, our work reinforces prior notions in the OOD domain, noting that models with higher ID accuracy performed better across the evaluated OOD detection methods. Furthermore, transformer-based architectures consistently outperformed convolution-based models in detecting OOD samples across various methods.