Same Graph, Different Likelihoods: Calibration of Autoregressive Graph Generators via Permutation-Equivalent Encodings

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inconsistency in likelihood estimation of autoregressive graph generation models caused by their dependence on arbitrary node linearization orders, which undermines evaluation reliability. The authors propose a Linearity Uncertainty (LU) metric that assesses likelihood consistency across multiple sampled linearizations, thereby evaluating model behavior under permutation equivalence. Their analysis reveals that existing methods often overfit to the training-order rather than capturing intrinsic graph structure. Combining the SENT linearization strategy with a Transformer architecture, experiments demonstrate that LU strongly correlates with molecular stability on QM9 (AUC=0.85), substantially outperforming conventional negative log-likelihood (AUC=0.43). Furthermore, LU exposes calibration errors in current models that are up to two orders of magnitude larger than previously recognized, advocating for a new evaluation paradigm grounded in permutation invariance.
📝 Abstract
Autoregressive graph generators define likelihoods via a sequential construction process, but these likelihoods are only meaningful if they are consistent across all linearizations of the same graph. Segmented Eulerian Neighborhood Trails (SENT), a recent linearization method, converts graphs into sequences that can be perfectly decoded and efficiently processed by language models, but admit multiple equivalent linearizations of the same graph. We quantify violations in assigned negative log-likelihood (NLL) using the coefficient of variation across equivalent linearizations, which we call Linearization Uncertainty (LU). Training transformers under four linearization strategies on two datasets, we show that biased orderings achieve lower NLL on their native order but exhibit expected calibration error (ECE) two orders of magnitude higher under random permutation, indicating that these models have learned their training linearization rather than the underlying graph. On the molecular graph benchmark QM9, NLL for generated graphs is negatively correlated with molecular stability (AUC $=0.43$), while LU achieves AUC $=0.85$, suggesting that permutation-based evaluation provides a more reliable quality check for generated molecules. Code is available at https://github.com/lauritsf/linearization-uncertainty
Problem

Research questions and friction points this paper is trying to address.

autoregressive graph generation
linearization uncertainty
permutation-equivalent encodings
likelihood consistency
graph calibration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linearization Uncertainty
Autoregressive Graph Generation
Permutation-Equivalent Encodings
Calibration
Graph Linearization