The Art of Misclassification: Too Many Classes, Not Enough Points

📅 2025-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the inherent difficulty of classification under high-class-count, low-sample-size regimes. We propose an information-theoretic “class separability” metric grounded in entropy, which formally characterizes the irreducible inter-class overlap and uncertainty intrinsic to a dataset in feature space. Unlike prior measures, our metric is model-agnostic and sample-size-independent, enabling derivation of a fundamental theoretical upper bound on classification accuracy—i.e., a performance ceiling that no classifier can surpass. Leveraging entropy analysis and uncertainty modeling, we establish a tight generalization bound and empirically validate that this bound aligns closely with human perception of ambiguous decision boundaries. Our core contribution is the formal definition and quantification of the intrinsic solvability of a classification task, thereby providing a principled theoretical benchmark for algorithm design, model selection, and dataset evaluation.

Technology Category

Application Category

📝 Abstract
Classification is a ubiquitous and fundamental problem in artificial intelligence and machine learning, with extensive efforts dedicated to developing more powerful classifiers and larger datasets. However, the classification task is ultimately constrained by the intrinsic properties of datasets, independently of computational power or model complexity. In this work, we introduce a formal entropy-based measure of classificability, which quantifies the inherent difficulty of a classification problem by assessing the uncertainty in class assignments given feature representations. This measure captures the degree of class overlap and aligns with human intuition, serving as an upper bound on classification performance for classification problems. Our results establish a theoretical limit beyond which no classifier can improve the classification accuracy, regardless of the architecture or amount of data, in a given problem. Our approach provides a principled framework for understanding when classification is inherently fallible and fundamentally ambiguous.
Problem

Research questions and friction points this paper is trying to address.

Quantifies classification difficulty via entropy
Establishes theoretical limit for classifier accuracy
Provides framework for inherently fallible classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy-based classificability measure
Quantifies class assignment uncertainty
Theoretical limit on classification accuracy
🔎 Similar Papers
No similar papers found.
M
Mário Franco
School of Systems Science and Industrial Enginnering, Binghamton University, Binghamton, USA
G
Gerardo Febres
School of Systems Science and Industrial Enginnering, Binghamton University, Binghamton, USA; Universidad Simón Bolívar, Caracas, Venezuela
N
Nelson Fernández
School of Systems Science and Industrial Enginnering, Binghamton University, Binghamton, USA; Grupo de Investigación en Ecología y Biogeografía, Universidad de Pamplona, Pamplona, Colombia
Carlos Gershenson
Carlos Gershenson
Professor of Empire Innovation, Binghamton University
complex systemsself-organizing systemsartificial lifeinformationurban mobility