Deep Linear Probe Generators for Weight Space Learning

πŸ“… 2024-10-14
πŸ›οΈ International Conference on Learning Representations
πŸ“ˆ Citations: 6
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Direct inference of training/generalization error from model weights remains challenging due to high dimensionality and neuron permutation symmetry in weight-space learning. Method: We propose ProbeGenβ€”a deep linear probe generator that introduces a shared, deep linear generative module to inject structural inductive bias into input probes, thereby substantially mitigating overfitting inherent in conventional probe learning. By analyzing the output responses of structured probes via forward propagation, ProbeGen achieves efficient representation of the weight space. Contribution/Results: Across multiple benchmarks, ProbeGen outperforms state-of-the-art methods with 30–1000Γ— lower computational cost (significantly reduced FLOPs) and enhanced robustness. To our knowledge, this is the first work to systematically integrate structured probe generation with weight-space representation learning, establishing a novel paradigm for model diagnosis and generalization analysis.

Technology Category

Application Category

πŸ“ Abstract
Weight space learning aims to extract information about a neural network, such as its training dataset or generalization error. Recent approaches learn directly from model weights, but this presents many challenges as weights are high-dimensional and include permutation symmetries between neurons. An alternative approach, Probing, represents a model by passing a set of learned inputs (probes) through the model, and training a predictor on top of the corresponding outputs. Although probing is typically not used as a stand alone approach, our preliminary experiment found that a vanilla probing baseline worked surprisingly well. However, we discover that current probe learning strategies are ineffective. We therefore propose Deep Linear Probe Generators (ProbeGen), a simple and effective modification to probing approaches. ProbeGen adds a shared generator module with a deep linear architecture, providing an inductive bias towards structured probes thus reducing overfitting. While simple, ProbeGen performs significantly better than the state-of-the-art and is very efficient, requiring between 30 to 1000 times fewer FLOPs than other top approaches.
Problem

Research questions and friction points this paper is trying to address.

Improving probe learning strategies for neural network analysis
Addressing ineffectiveness of current weight space probing methods
Reducing computational costs while maintaining model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Deep linear probe generators reduce overfitting
Shared generator module creates structured probes
Method requires significantly fewer computational operations
πŸ”Ž Similar Papers
No similar papers found.
J
Jonathan Kahana
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
E
Eliahu Horwitz
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
I
Imri Shuval
School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel
Yedid Hoshen
Yedid Hoshen
The Hebrew University of Jerusalem
Deep LearningAIComputer Vision