Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates how to efficiently discover diverse task-specific experts within the neighborhood of pre-trained model weights, circumventing the need for complex iterative optimization. Treating the pre-trained weights as the center of a parameter distribution, the authors propose a gradient-free post-training approach that generates candidate models via parallel random perturbations, followed by Top-K selection and majority-vote ensembling. Evaluated on large language models, this method achieves performance comparable to standard post-training strategies—including PPO, GRPO, and evolutionary algorithms—demonstrating that high-performing task experts are densely concentrated around the pre-trained weights. These findings establish a new paradigm for efficient model customization without requiring extensive fine-tuning or gradient-based updates.

Technology Category

Application Category

📝 Abstract
Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in small models such expert solutions occupy a negligible fraction of the volume of this distribution, making their discovery reliant on structured optimization methods such as gradient descent. In contrast, in large, well-pretrained models the density of task-experts increases dramatically, so that diverse, task-improving specialists populate a substantial fraction of the neighborhood around the pretrained weights. Motivated by this perspective, we explore a simple, fully parallel post-training method that samples $N$ parameter perturbations at random, selects the top $K$, and ensembles predictions via majority vote. Despite its simplicity, this approach is competitive with standard post-training methods such as PPO, GRPO, and ES for contemporary large-scale models.
Problem

Research questions and friction points this paper is trying to address.

pretraining
task-specific experts
parameter space
neural networks
model scaling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Thickets
Task-Specific Experts
Parameter Perturbation
Ensemble Voting
Post-Training
🔎 Similar Papers
No similar papers found.