🤖 AI Summary
This work addresses the limitations of conventional large language model post-training, which relies on a single parameter set and often yields averaged behaviors due to data heterogeneity and conflicting signals. The authors propose an α-Rényi variational post-training framework that explicitly models epistemic uncertainty by learning a distribution over LoRA adapters on a frozen base model, rather than a point estimate. By introducing a tunable α-Rényi divergence into the variational objective, the method enables soft routing and dynamic balancing between global consistency and task-specific specialization, while unifying supervised fine-tuning and preference optimization within a single formulation. Experiments demonstrate that the framework supports soft assignment of samples to multiple experts, enhances specialization in multi-task settings, and provides reliable, scalable uncertainty estimates.
📝 Abstract
Existing training approaches for large language models learn a single set of parameters, based on large volumes of data, which is typically heterogeneous, conflicting and often outright contradictory. As a result, the model is forced to compress conflicting goals, and inherent uncertainties into a single, averaged pattern of behaviour. We propose an $α$-Rényi variational framework for learning distributions over post-training parameters, offering an uncertainty-aware alternative to deep ensemble approaches. The resulting variational objective interpolates between classical variational Bayes and predictively oriented posterior learning, balancing between globally plausible individual models against systems of complementary specialists. We identify local stability criteria, demonstrating how model misspecification can make non-degenerate posterior spread locally favourable, manifesting contradictory or conflicting data as epistemic uncertainty. We apply our framework to LLM post-training, learning an ensemble of LoRA adapters attached to a shared, frozen base model, providing a scalable training procedure for both supervised fine-tuning and preference optimisation. Our approach enables training examples to be softly routed across ensemble members, promoting model specialisation and providing actionable uncertainty estimates across different tasks.