I Am Big, You Are Little; I Am Right, You Are Wrong

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

Despite the growing diversity of image classification models, their decision-making mechanisms—particularly their reliance on critical pixels—remain poorly characterized and lack interpretable quantification. Method: This paper introduces “decision concentration,” quantified via the Minimal Sufficient Pixel Set (MSPS), to systematically analyze statistical differences across state-of-the-art architectures—including ConvNeXt and EVA—in terms of spatial distribution of critical pixels, inter-sample overlap rates, and MSPS cardinality. Contribution/Results: We identify architecture-specific concentration patterns: ConvNeXt exhibits higher local concentration, relying on compact pixel regions, whereas EVA demonstrates broader, more global dependency. Moreover, misclassified samples consistently require significantly larger MSPS sizes. This work establishes, for the first time, a strong empirical correlation between model architecture and decision concentration, providing a novel, quantitative paradigm for model diagnosis, architecture selection, and interpretability assessment.

Technology Category

Application Category

📝 Abstract

Machine learning for image classification is an active and rapidly developing field. With the proliferation of classifiers of different sizes and different architectures, the problem of choosing the right model becomes more and more important. While we can assess a model's classification accuracy statistically, our understanding of the way these models work is unfortunately limited. In order to gain insight into the decision-making process of different vision models, we propose using minimal sufficient pixels sets to gauge a model's `concentration': the pixels that capture the essence of an image through the lens of the model. By comparing position, overlap, and size of sets of pixels, we identify that different architectures have statistically different concentration, in both size and position. In particular, ConvNext and EVA models differ markedly from the others. We also identify that images which are misclassified are associated with larger pixels sets than correct classifications.

Problem

Research questions and friction points this paper is trying to address.

Choosing the right image classification model among diverse architectures

Understanding decision-making processes of vision models via minimal pixel sets

Comparing concentration differences in model architectures and misclassifications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Using minimal sufficient pixel sets

Comparing pixel set position and overlap

Analyzing concentration differences across architectures

🔎 Similar Papers

No similar papers found.