Shortcut Features as Top Eigenfunctions of NTK: A Linear Neural Network Case and More

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

241K/year

🤖 AI Summary

This work investigates how deep learning models tend to rely on shortcut features when dominant patterns exist in the training data, thereby compromising generalization. Within the neural tangent kernel (NTK) framework, the authors model network features as eigenfunctions of the NTK and theoretically demonstrate that shortcut features correspond to eigenfunctions associated with large eigenvalues. They show that the dominance of these features arises not only from the max-margin bias but also from intra-class data variance, and that their influence persists in model outputs after training. The mechanism is rigorously derived for linear networks and empirically validated on ReLU networks and ResNet-18, confirming its applicability across more complex architectures.

Technology Category

Application Category

📝 Abstract

One of the chronic problems of deep-learning models is shortcut learning. In a case where the majority of training data are dominated by a certain feature, neural networks prefer to learn such a feature even if the feature is not generalizable outside the training set. Based on the framework of Neural Tangent Kernel (NTK), we analyzed the case of linear neural networks to derive some important properties of shortcut learning. We defined a feature of a neural network as an eigenfunction of NTK. Then, we found that shortcut features correspond to features with larger eigenvalues when the shortcuts stem from the imbalanced number of samples in the clustered distribution. We also showed that the features with larger eigenvalues still have a large influence on the neural network output even after training, due to data variances in the clusters. Such a preference for certain features remains even when a margin of a neural network output is controlled, which shows that the max-margin bias is not the only major reason for shortcut learning. These properties of linear neural networks are empirically extended for more complex neural networks as a two-layer fully-connected ReLU network and a ResNet-18.

Problem

Research questions and friction points this paper is trying to address.

shortcut learning

Neural Tangent Kernel

eigenfunctions

feature preference

sample imbalance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Tangent Kernel

shortcut learning

eigenfunctions