Neural Thickets: Diverse Task Experts Are Dense Around Pretrained Weights

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work investigates how to efficiently discover diverse task-specific experts within the neighborhood of pre-trained model weights, circumventing the need for complex iterative optimization. Treating the pre-trained weights as the center of a parameter distribution, the authors propose a gradient-free post-training approach that generates candidate models via parallel random perturbations, followed by Top-K selection and majority-vote ensembling. Evaluated on large language models, this method achieves performance comparable to standard post-training strategies—including PPO, GRPO, and evolutionary algorithms—demonstrating that high-performing task experts are densely concentrated around the pre-trained weights. These findings establish a new paradigm for efficient model customization without requiring extensive fine-tuning or gradient-based updates.

Technology Category

Application Category

📝 Abstract

Pretraining produces a learned parameter vector that is typically treated as a starting point for further iterative adaptation. In this work, we instead view the outcome of pretraining as a distribution over parameter vectors, whose support already contains task-specific experts. We show that in small models such expert solutions occupy a negligible fraction of the volume of this distribution, making their discovery reliant on structured optimization methods such as gradient descent. In contrast, in large, well-pretrained models the density of task-experts increases dramatically, so that diverse, task-improving specialists populate a substantial fraction of the neighborhood around the pretrained weights. Motivated by this perspective, we explore a simple, fully parallel post-training method that samples $N$ parameter perturbations at random, selects the top $K$, and ensembles predictions via majority vote. Despite its simplicity, this approach is competitive with standard post-training methods such as PPO, GRPO, and ES for contemporary large-scale models.

Problem

Research questions and friction points this paper is trying to address.

pretraining

task-specific experts

parameter space

neural networks

model scaling

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural Thickets

Task-Specific Experts

Parameter Perturbation