🤖 AI Summary
This work investigates the expressive power of permutation-equivariant weight-space networks in both weight space and function space, with a focus on their universality and inherent limitations. Addressing gaps in existing theory regarding the conditions for universality and the boundaries of failure, we provide the first unified proof that mainstream architectures possess equivalent expressive capabilities under natural assumptions, and establish a formal universality theory across both spaces. By integrating permutation equivariance, function approximation theory, and structural properties of weight space, we rigorously characterize the expressive capacity of such networks, precisely identifying necessary and sufficient conditions for achieving universality as well as edge cases that lie beyond their approximation capabilities.
📝 Abstract
Weight-space learning studies neural architectures that operate directly on the parameters of other neural networks. Motivated by the growing availability of pretrained models, recent work has demonstrated the effectiveness of weight-space networks across a wide range of tasks. SOTA weight-space networks rely on permutation-equivariant designs to improve generalization. However, this may negatively affect expressive power, warranting theoretical investigation. Importantly, unlike other structured domains, weight-space learning targets maps operating on both weight and function spaces, making expressivity analysis particularly subtle. While a few prior works provide partial expressivity results, a comprehensive characterization is still missing. In this work, we address this gap by developing a systematic theory for expressivity of weight-space networks. We first prove that all prominent permutation-equivariant networks are equivalent in expressive power. We then establish universality in both weight- and function-space settings under mild, natural assumptions on the input weights, and characterize the edge-case regimes where universality no longer holds. Together, these results provide a strong and unified foundation for the expressivity of weight-space networks.