🤖 AI Summary
Large language models (LLMs) face significant challenges in task adaptation under resource-constrained and closed-source API settings, where conventional parameter-efficient fine-tuning (PEFT) methods are inapplicable due to their reliance on direct model parameter access and high computational overhead.
Method: This paper proposes a lightweight, parameter-free knowledge injection framework that enables task-specific adaptation without accessing the LLM’s internal parameters. Its core innovation is the “Specialized Small Model (SSM) Collaboration Paradigm,” integrating knowledge distillation from the LLM, distribution-aware task modeling, and zero-parameter coupling between the SSM and the LLM.
Contribution/Results: Experiments demonstrate that our approach matches PEFT-level performance across diverse downstream tasks while reducing GPU memory consumption by over 90% and inference latency by 85%. Crucially, it operates entirely within black-box API environments—requiring no model weights, gradients, or architectural access—thus enabling seamless integration with proprietary, closed-source LLM APIs.
📝 Abstract
While the enormous parameter scale endows Large Models (LMs) with unparalleled performance, it also limits their adaptability across specific tasks. Parameter-Efficient Fine-Tuning (PEFT) has emerged as a critical approach for effectively adapting LMs to a diverse range of downstream tasks. However, existing PEFT methods face two primary challenges: (1) High resource cost. Although PEFT methods significantly reduce resource demands compared to full fine-tuning, it still requires substantial time and memory, making it impractical in resource-constrained environments. (2) Parameter dependency. PEFT methods heavily rely on updating a subset of parameters associated with LMs to incorporate task-specific knowledge. Yet, due to increasing competition in the LMs landscape, many companies have adopted closed-source policies for their leading models, offering access only via Application Programming Interface (APIs). Whereas, the expense is often cost-prohibitive and difficult to sustain, as the fine-tuning process of LMs is extremely slow. Even if small models perform far worse than LMs in general, they can achieve superior results on particular distributions while requiring only minimal resources. Motivated by this insight, we propose Easy Adaptation (EA), which designs Specific Small Models (SSMs) to complement the underfitted data distribution for LMs. Extensive experiments show that EA matches the performance of PEFT on diverse tasks without accessing LM parameters, and requires only minimal resources.