🤖 AI Summary
The proliferation of open-source fine-tuned LLMs is hindered by sparse metadata and inconsistent repository structures, impeding model discovery, interpretation, and reuse. To address this, we propose Delta Activations: the first representation encoding a fine-tuned model as layer-wise activation differences relative to its base model. This representation is robust across architectures and training configurations, approximately additive under data mixing, and supports few-shot task embedding. Leveraging Delta Activations, we establish a unified framework for model clustering, semantic similarity retrieval, and parameter fusion. Experiments demonstrate that our method accurately infers fine-tuning task types and significantly outperforms baselines in model selection and merging—enhancing the structured organization and discoverability of publicly available fine-tuned models.
📝 Abstract
The success of powerful open source Large Language Models (LLMs) has enabled the community to create a vast collection of post-trained models adapted to specific tasks and domains. However, navigating and understanding these models remains challenging due to inconsistent metadata and unstructured repositories. We introduce Delta Activations, a method to represent finetuned models as vector embeddings by measuring shifts in their internal activations relative to a base model. This representation allows for effective clustering by domain and task, revealing structure in the model landscape. Delta Activations also demonstrate desirable properties: it is robust across finetuning settings and exhibits an additive property when finetuning datasets are mixed. In addition, we show that Delta Activations can embed tasks via few-shot finetuning, and further explore its use for model selection and merging. We hope Delta Activations can facilitate the practice of reusing publicly available models. Code is available at https://github.com/OscarXZQ/delta_activations.