From Latent Space to Training Data: Explainable Specialization in Minimal MLPs

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This study investigates whether training biases can drive interpretable specialization of hidden neurons in minimal Gaussian-activated MLPs and enhance the ability to reconstruct training data from weights. By introducing three structured regularization losses—coverage, separation, and response overlap—in networks whose width equals the dataset size, the work systematically evaluates their impact on neuronal specialization and prototype reconstruction. Experiments demonstrate that coverage regularization significantly increases prototype utilization and reduces reconstruction error, whereas purely repulsive losses, without compatible attractive terms, cause prototypes to collapse outside the convex hull of the input data. Based on 480 controlled experiments (N=3–100), the study validates the design principle that “repulsion must be paired with attraction,” offering an effective mechanism for achieving interpretable neuron specialization.

📝 Abstract

We here study whether training biases can make hidden neurons specialize in minimal one-hidden-layer MLPs, and whether such specialization improves prototype-based reconstruction of the training dataset from the learned weights. We consider Gaussianactivation MLPs of width equal to dataset size and compare three structural losses that respectively encourage coverage of the training samples, separation between neuron-induced prototypes, and low overlap of hidden responses, against the standard fitting baseline. Experiments on uniformly sampled one-dimensional datasets show a stable pattern from N = 3 to N = 100 across 480 controlled runs. Coverage regularization gives the lowest mean reconstruction error at every tested size and raises the prototype-usage specialization ratio relative to the standard baseline, while separation has mixed effects and overlap penalties are systematically harmful. We show that the harm is not an optimization failure: overlap-active approaches fit the data as well as overlap-free ones but route the optimizer to a degenerate equilibrium in which prototype centers are pushed outside the convex hull of the training inputs. Coverage cannot reward this expulsion and acts as an attractor: separation admits it only at large temperature and overlap admits it at the nominal hyperparameter choice. A direct τ-sweep on the separation-only mask and a prototype-position visualization at N = 100 confirm the mechanism. The findings yield a simple design principle for prototype-recoverability-aware training: every repulsive structural loss must be compensated by a compatible attractor, or it will collapse the latent geometry it was meant to refine.

Problem

Research questions and friction points this paper is trying to address.

specialization

prototype reconstruction

latent space

training bias

MLP

Innovation

Methods, ideas, or system contributions that make the work stand out.

prototype specialization

structural loss

coverage regularization