Do Deep Ensembles Actually Capture Uncertainty in Graph Neural Networks?

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study investigates the effectiveness of deep ensembles in quantifying uncertainty within message-passing graph neural networks. Through systematic evaluation across seven graph-based tasks, the work demonstrates that performance gains primarily stem from improved optimization stability rather than accurate modeling of epistemic uncertainty. The paper reveals, for the first time, a phenomenon termed “epistemic collapse” in graph neural networks: independently trained ensemble members produce highly similar predictions, thereby failing to capture genuine epistemic uncertainty. By integrating uncertainty decomposition with function-space analysis, the authors show that deep ensembles offer limited improvement in uncertainty estimation on graph-structured data, indicating that their success in conventional domains does not readily transfer to graph machine learning.

📝 Abstract

While deep ensembles are widely considered to be the default method for uncertainty quantification in deep learning, their effectiveness for graph-structured data is often simply assumed based on successes in domains like computer vision. We investigate standard deep ensembles specifically for message-passing graph neural networks. Benchmarking across seven datasets representing varied tasks and complexities, we reveal that ensembles provide surprisingly little improvement over a single model. Instead, the observed marginal gains stem primarily from stabilizing optimization noise in point predictions rather than yielding meaningfully better uncertainty estimates. Through an aleatoric-epistemic decomposition, we identify epistemic collapse: independently trained networks consistently converge to overly similar predictions. Because disagreement is the fundamental mechanism through which ensembles capture epistemic uncertainty, this lack of diversity neutralizes their key advantage. Analyzing this phenomenon further, we suggest this collapse is driven by functional rather than weight-space convexity, where distinct parameter solutions induce almost identical behavior. Our results suggest that deep ensemble success does not seamlessly transfer to graph machine learning.

Problem

Research questions and friction points this paper is trying to address.

uncertainty quantification

deep ensembles

graph neural networks

epistemic uncertainty

ensemble diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

deep ensembles

graph neural networks

uncertainty quantification