🤖 AI Summary
This work investigates the underlying mechanisms by which domain-specific fine-tuning reshapes the parameter space of large language models (LLMs). Addressing the lack of clarity in existing studies regarding how fine-tuning operates, we propose the “tuning vector” framework—conceptualizing fine-tuning as a directional shift in parameter space and quantifying layer-wise parameter changes via the task vector paradigm. Experiments reveal that fine-tuning sparsely activates *new* representational directions exclusively in MLP layers, while selectively reinforcing *pre-existing critical directions* in attention heads; the overall parameter manifold remains structurally intact, with adjustments confined to a small subspace of representations. Validation on medical-domain LLMs demonstrates substantial improvements in instruction-following and text generation quality. Moreover, cross-domain composition of tuning vectors enhances generalization. To our knowledge, this is the first study to characterize domain fine-tuning as a sparse, structured deformation of the parameter manifold—revealing its intrinsic directional and low-dimensional nature.
📝 Abstract
Large Language Models (LLMs) fine-tuned for specific domains exhibit strong performance; however, the underlying mechanisms by which this fine-tuning reshapes their parametric space are not well understood. Prior works primarily focus on auto-regressive or general-purpose instruct models, leaving domain-specialised LLMs under-explored. We present the first systematic study of domain-specific fine-tuning in large medical language models. Our analysis reveals that fine-tuning modifies only a small subset of the representational subspace, essentially preserving the pre-trained model's representation. To interpret these changes in subspaces, we propose tuning vectors, a novel framework inspired by task vectors, which explicitly capture the directional parameter shifts induced by fine-tuning. We demonstrate that these vectors are critical for enhancing both instruction-following and generation quality. Furthermore, combining tuning vectors across different domains yields improved generalisation. Upon closer inspection of directional alignment, we find these vectors primarily write new directional information into the MLP layers of the model, while amplifying existing directions in attention heads. Our findings offer new insights into LLM adaptation and provide a general, interpretable framework for analysing specialisation in large language models.