🤖 AI Summary
This paper addresses the challenge of erroneous predictions by AI image recognition models on in-distribution (ID), out-of-distribution (OOD), and adversarial examples. We propose SuperMentor, a generalizable “oracle” model that unifies modeling of all three error types via a single homogeneous mentor architecture. Methodologically, SuperMentor integrates deep CNNs and Transformers, employs error distillation training, and incorporates cross-architectural generalization design to enable plug-and-play error prediction across diverse mentee models. Our key contributions are: (1) the first unified modeling and joint prediction framework for ID, OOD, and adversarial errors; (2) state-of-the-art performance on ImageNet-1K—significantly outperforming all baselines, especially under small-perturbation adversarial attacks, with strong transferability in error prediction; and (3) robust heterogeneity-aware generalization, establishing a novel reliability-enhancement paradigm for high-stakes domains such as healthcare, finance, and autonomous driving.
📝 Abstract
AI models make mistakes when recognizing images-whether in-domain, out-of-domain, or adversarial. Predicting these errors is critical for improving system reliability, reducing costly mistakes, and enabling proactive corrections in real-world applications such as healthcare, finance, and autonomous systems. However, understanding what mistakes AI models make, why they occur, and how to predict them remains an open challenge. Here, we conduct comprehensive empirical evaluations using a"mentor"model-a deep neural network designed to predict another"mentee"model's errors. Our findings show that the mentor excels at learning from a mentee's mistakes on adversarial images with small perturbations and generalizes effectively to predict in-domain and out-of-domain errors of the mentee. Additionally, transformer-based mentor models excel at predicting errors across various mentee architectures. Subsequently, we draw insights from these observations and develop an"oracle"mentor model, dubbed SuperMentor, that can outperform baseline mentors in predicting errors across different error types from the ImageNet-1K dataset. Our framework paves the way for future research on anticipating and correcting AI model behaviors, ultimately increasing trust in AI systems. All code, models, and data will be made publicly available.