🤖 AI Summary
ReLU networks’ weight rescaling invariance renders conventional PAC-Bayes generalization bounds vacuous and inconsistent—identical functions yield vastly disparate complexity estimates.
Method: This work lifts the PAC-Bayes framework to function space, defining a rescaling-invariant complexity measure within a function-equivalent representation. Leveraging KL divergence bounds and the data processing inequality, it constructs data-dependent, non-vacuous, and tighter generalization guarantees.
Contribution/Results: The approach eliminates parameter redundancy’s influence on the bound, significantly reducing complexity estimation bias on standard architectures. By operating directly on functional equivalence classes rather than parameterizations, it yields a more discriminative theoretical tool for deep learning generalization analysis—providing the first rescaling-invariant, non-vacuous PAC-Bayes bound grounded in function-space geometry.
📝 Abstract
A central challenge in understanding generalization is to obtain non-vacuous guarantees that go beyond worst-case complexity over data or weight space. Among existing approaches, PAC-Bayes bounds stand out as they can provide tight, data-dependent guarantees even for large networks. However, in ReLU networks, rescaling invariances mean that different weight distributions can represent the same function while leading to arbitrarily different PAC-Bayes complexities. We propose to study PAC-Bayes bounds in an invariant, lifted representation that resolves this discrepancy. This paper explores both the guarantees provided by this approach (invariance, tighter bounds via data processing) and the algorithmic aspects of KL-based rescaling-invariant PAC-Bayes bounds.