Optimality of Sub-network Laplace Approximations: New Results and Methods

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing Laplace approximation methods for subnetworks suffer from systematic underestimation of predictive variance due to their reliance on heuristic parameter selection, neglect of parameter interactions, and lack of theoretical guarantees. This work rigorously establishes, for the first time, the inherent variance underestimation property of these methods and uncovers a monotonic convergence relationship between the degree of underestimation and subnetwork selection. To address these limitations, we propose two novel approaches with provable optimality guarantees: Gradient-Laplace, which efficiently selects parameters based on the average squared gradient of the output with respect to model parameters, and Greedy-Laplace, which iteratively incorporates off-diagonal Hessian terms to explicitly model parameter interactions. Theoretical analysis demonstrates that Gradient-Laplace outperforms existing heuristics, and extensive experiments confirm that both methods substantially improve uncertainty estimation across diverse settings.

📝 Abstract

Although the Laplace approximation offers a simple route to uncertainty quantification in deep neural networks, its reliance on inverting large Hessian matrices has motivated a range of computationally feasible low-dimensional or sparse approximations. A prominent class of such methods - sub-network Laplace approximations, constructs surrogates by restricting attention to a small subset of parameters. Existing approaches in this family typically rely on diagonal, layer-wise, or other architectural heuristics for subset selection, which ignore cross-parameter interactions and lack formal optimality guarantees. In this paper, we provide a rigorous theoretical analysis of the sub-network Laplace paradigm. We prove that all sub-network Laplace methods systematically underestimate the predictive variance of the full Laplace posterior, and that this bias decreases monotonically as the retained sub-matrix expands. Leveraging this insight, we propose two principled, analytically grounded sub-network Hessian approximations: \textit{Gradient-Laplace} selects parameters with the largest average squared gradients of the model output with respect to the parameters over a reference dataset; while \textit{Greedy-Laplace} iteratively refines this selection by accounting for off-diagonal interactions in the precision matrix. We establish theoretical guarantees characterizing their optimality properties and show that Gradient-Laplace provably outperforms existing heuristic approaches. Extensive numerical studies across diverse settings indicate that these methods perform strongly relative to existing benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Laplace approximation

sub-network

predictive variance

parameter selection

optimality

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sub-network Laplace

Uncertainty Quantification

Hessian Approximation