🤖 AI Summary
Evaluating the alignment between machine learning models’ abstraction capabilities and human conceptual knowledge remains challenging due to the lack of rigorous, quantifiable benchmarks.
Method: We propose the “Abstraction Alignment” framework, which uses human-encoded multi-level concept graphs as ground truth to quantify the proportion of model behavioral uncertainty explainable by such graphs. The framework integrates concept graph modeling, uncertainty attribution analysis, cross-level concept alignment measurement, and an expert-coordinated evaluation protocol.
Contribution/Results: This is the first approach enabling measurable comparison between model-learned concept relationships and human abstraction structures, supporting hypothesis testing and iterative refinement of abstraction systems. Experiments demonstrate that the framework effectively discriminates semantically similar misclassifications, substantially improves the interpretability of model quality metrics, and—through reverse analysis—uncovers latent gaps and opportunities for improvement in human conceptual taxonomies.
📝 Abstract
While interpretability methods identify a model's learned concepts, they overlook the relationships between concepts that make up its abstractions and inform its ability to generalize to new data. To assess whether models' have learned human-aligned abstractions, we introduce abstraction alignment, a methodology to compare model behavior against formal human knowledge. Abstraction alignment externalizes domain-specific human knowledge as an abstraction graph, a set of pertinent concepts spanning levels of abstraction. Using the abstraction graph as a ground truth, abstraction alignment measures the alignment of a model's behavior by determining how much of its uncertainty is accounted for by the human abstractions. By aggregating abstraction alignment across entire datasets, users can test alignment hypotheses, such as which human concepts the model has learned and where misalignments recur. In evaluations with experts, abstraction alignment differentiates seemingly similar errors, improves the verbosity of existing model-quality metrics, and uncovers improvements to current human abstractions.