đ¤ AI Summary
Existing approaches lack systematic visualization and understanding of class confusion and its training dynamics in deep learning. This work proposes GRAPHIC, a model-agnostic method that, for the first time, treats confusion matrices generated by intermediate-layer linear classifiers as adjacency matrices of directed graphs and leverages network science tools to model and analyze the evolution of inter-class confusion throughout training. The approach effectively uncovers key phenomena such as linear separability, label ambiguity, and semantic similarityâexemplified by unexpected confusions like âflounderâ versus âpersonââand validates these findings through human studies. By offering an interpretable lens into how neural networks learn to distinguish classes, GRAPHIC provides novel insights into the underlying mechanisms of deep learning.
đ Abstract
Explainable artificial intelligence has emerged as a promising field of research to address reliability concerns in artificial intelligence. Despite significant progress in explainable artificial intelligence, few methods provide a systematic way to visualize and understand how classes are confused and how their relationships evolve as training progresses. In this work, we present GRAPHIC, an architecture-agnostic approach that analyzes neural networks on a class level. It leverages confusion matrices derived from intermediate layers using linear classifiers. We interpret these as adjacency matrices of directed graphs, allowing tools from network science to visualize and quantify learning dynamics across training epochs and intermediate layers. GRAPHIC provides insights into linear class separability, dataset issues, and architectural behavior, revealing, for example, similarities between flatfish and man and labeling ambiguities validated in a human study. In summary, by uncovering real confusions, GRAPHIC offers new perspectives on how neural networks learn. The code is available at https://github.com/Johanna-S-Froehlich/GRAPHIC.