🤖 AI Summary
This work addresses the challenge of enhancing interpretability in deep learning by introducing the **Loss Kernel**—a novel geometric probe grounded in the covariance structure of sample-wise losses. Methodologically, it defines an intrinsic similarity metric among data points by modeling how individual sample losses co-vary under infinitesimal parameter perturbations. This approach uniquely bridges statistical stability in loss space with kernel-based representation learning. Theoretically, the Loss Kernel is proven to separate inputs according to semantic task structure; empirically, it uncovers hierarchical semantic organization within deep networks. Visualizations on Inception-v1 reveal interpretable feature-level relationships; validation on synthetic multi-task datasets confirms its fidelity; and on ImageNet, its implicit clustering aligns closely with the WordNet semantic hierarchy. Collectively, the Loss Kernel provides a principled, geometric, and generalizable framework for interpreting deep neural representations—offering both theoretical guarantees and practical insights into latent semantic structure.
📝 Abstract
We introduce the loss kernel, an interpretability method for measuring similarity between data points according to a trained neural network. The kernel is the covariance matrix of per-sample losses computed under a distribution of low-loss-preserving parameter perturbations. We first validate our method on a synthetic multitask problem, showing it separates inputs by task as predicted by theory. We then apply this kernel to Inception-v1 to visualize the structure of ImageNet, and we show that the kernel's structure aligns with the WordNet semantic hierarchy. This establishes the loss kernel as a practical tool for interpretability and data attribution.