🤖 AI Summary
Graph neural networks (GNNs) suffer from miscalibrated confidence estimates in safety-critical applications; existing calibration methods largely rely on one-hop neighborhood statistics or latent embeddings, overlooking fine-grained structural heterogeneity in graphs. To address this, we propose Wavelet-Aware Temperature Scaling (WATS), the first framework to incorporate learnable heat-kernel graph wavelets into GNN calibration. WATS generates node-specific temperature parameters via multi-scale local topological features, enabling post-hoc, node-level calibration without retraining. Its core innovation lies in leveraging heat-kernel wavelets to capture structural heterogeneity—thereby overcoming the topological blindness inherent in prior approaches. Evaluated on seven benchmark datasets, WATS achieves the lowest expected calibration error (ECE), reducing it by up to 42.3% over the best baseline, while decreasing calibration variance by an average of 17.24%. The method demonstrates both computational efficiency and strong generalization across diverse graph domains.
📝 Abstract
Graph Neural Networks (GNNs) have demonstrated strong predictive performance on relational data; however, their confidence estimates often misalign with actual predictive correctness, posing significant limitations for deployment in safety-critical settings. While existing graph-aware calibration methods seek to mitigate this limitation, they primarily depend on coarse one-hop statistics, such as neighbor-predicted confidence, or latent node embeddings, thereby neglecting the fine-grained structural heterogeneity inherent in graph topology. In this work, we propose Wavelet-Aware Temperature Scaling (WATS), a post-hoc calibration framework that assigns node-specific temperatures based on tunable heat-kernel graph wavelet features. Specifically, WATS harnesses the scalability and topology sensitivity of graph wavelets to refine confidence estimates, all without necessitating model retraining or access to neighboring logits or predictions. Extensive evaluations across seven benchmark datasets with varying graph structures and two GNN backbones demonstrate that WATS achieves the lowest Expected Calibration Error (ECE) among all compared methods, outperforming both classical and graph-specific baselines by up to 42.3% in ECE and reducing calibration variance by 17.24% on average compared with graph-specific methods. Moreover, WATS remains computationally efficient, scaling well across graphs of diverse sizes and densities. Code will be released based on publication.