🤖 AI Summary
This work addresses a key limitation in Bayesian prediction with Prior-Data Fitted Networks (PFNs): the standard output conflates irreducible aleatoric noise with epistemic model uncertainty, hindering applications like active learning and Bayesian optimization that rely solely on epistemic uncertainty. The authors propose the first method to achieve identifiable decomposition of these uncertainty sources within the PFN framework. By introducing a structured synthetic prior that explicitly models latent signal and heteroscedastic noise, and designing a decoupled network architecture that separately predicts noise-free targets and noise variance, their approach enables exploration strategies based exclusively on epistemic uncertainty. Experiments demonstrate substantial improvements over total-variance baselines in noisy and heteroscedastic settings, achieving state-of-the-art average performance on hyperparameter optimization and synthetic Bayesian optimization benchmarks.
📝 Abstract
Prior-Fitted Networks (PFNs) amortize Bayesian prediction by meta-learning over a synthetic task prior, but their standard output is a posterior predictive distribution over noisy observations. For sequential decision-making, such as active learning and Bayesian optimization, acquisition should prioritize epistemic uncertainty about the latent signal rather than irreducible aleatoric observation noise. We show that this epistemic--aleatoric split is not identifiable in general from the posterior predictive distribution alone, even when that distribution is known exactly. We then exploit a distinctive advantage of PFNs: because the synthetic data-generating process is under our control, each task can contain an explicit latent signal and noise function, and the generator can provide query-level labels for both the noiseless target and the observation-noise variance. We use these labels to train a decoupled PFN with separate latent-signal and aleatoric heads. The observation-level predictive is induced by convolving the latent signal distribution with the learned noise model. Empirically, epistemic-only acquisition mitigates the failure mode of total-variance exploration in noisy and heteroscedastic settings. In matched comparisons, decoupled models usually improve over tuned observation-level baselines, with the clearest gains in HPO; in broader sweeps, a decoupled model obtains the best average rank in both HPO and synthetic BO.