🤖 AI Summary
This work addresses the challenge of deploying numerous task-specific adapters for on-device large language models under stringent memory constraints. To this end, it introduces a data-driven adapter clustering and fusion approach—the first of its kind—that requires only ten samples per task. By iteratively optimizing adapter representations, the method clusters similar adapters and merges those within the same cluster into a shared multi-task adapter. Integrated with parameter-efficient fine-tuning frameworks such as LoRA, this strategy significantly enhances cross-task generalization while adhering to tight storage budgets. Extensive experiments demonstrate the effectiveness and practicality of the proposed method on resource-constrained devices, establishing a new paradigm for efficient multi-task adapter deployment.
📝 Abstract
On-device large language models commonly employ task-specific adapters (e.g., LoRAs) to deliver strong performance on downstream tasks. While storing all available adapters is impractical due to memory constraints, mobile devices typically have sufficient capacity to store a limited number of these parameters. This raises a critical challenge: how to select representative adapters that generalize well across multiple tasks - a problem that remains unexplored in existing literature. We propose a novel method D2C for adapter clustering that leverages minimal task-specific examples (e.g., 10 per task) and employs an iterative optimization process to refine cluster assignments. The adapters within each cluster are merged, creating multi-task adapters deployable on resource-constrained devices. Experimental results demonstrate that our method effectively boosts performance for considered storage budgets.