🤖 AI Summary
To address the challenge of sustaining personalized adaptation for downstream users amid frequent large language model (LLM) updates, this paper proposes PortLLM: a training-free, cross-version reusable lightweight model patching mechanism. PortLLM leverages parameter-space alignment and incremental perturbation modeling to seamlessly transfer domain-specific knowledge acquired on legacy LLMs to newer versions—without requiring additional training or fine-tuning data. It is compatible with prevalent parameter-efficient fine-tuning (PEFT) paradigms such as LoRA and supports mainstream open-source architectures including Mistral, Llama, and Gemma. Evaluated on seven benchmark datasets (e.g., BoolQ, GSM8K), PortLLM matches LoRA’s performance while reducing peak GPU memory consumption by up to 12.2×. A theoretical analysis formally guarantees patch portability across model versions. To our knowledge, PortLLM is the first approach enabling zero-cost, low-overhead, and sustainable LLM personalization evolution.
📝 Abstract
As large language models (LLMs) increasingly shape the AI landscape, fine-tuning pretrained models has become more popular than in the pre-LLM era for achieving optimal performance in domain-specific tasks. However, pretrained LLMs such as ChatGPT are periodically evolved, i.e., model parameters are frequently updated), making it challenging for downstream users with limited resources to keep up with fine-tuning the newest LLMs for their domain application. Even though fine-tuning costs have nowadays been reduced thanks to the innovations of parameter-efficient fine-tuning such as LoRA, not all downstream users have adequate computing for frequent personalization. Moreover, access to fine-tuning datasets, particularly in sensitive domains such as healthcare, could be time-restrictive, making it crucial to retain the knowledge encoded in earlier fine-tuned rounds for future adaptation. In this paper, we present PortLLM, a training-free framework that (i) creates an initial lightweight model update patch to capture domain-specific knowledge, and (ii) allows a subsequent seamless plugging for the continual personalization of evolved LLM at minimal cost. Our extensive experiments cover seven representative datasets, from easier question-answering tasks {BoolQ, SST2} to harder reasoning tasks {WinoGrande, GSM8K}, and models including {Mistral-7B, Llama2, Llama3.1, and Gemma2}, validating the portability of our designed model patches and showcasing the effectiveness of our proposed framework. For instance, PortLLM achieves comparable performance to LoRA fine-tuning with reductions of up to 12.2x in GPU memory usage. Finally, we provide theoretical justifications to understand the portability of our model update patches, which offers new insights into the theoretical dimension of LLMs' personalization.