🤖 AI Summary
Existing parameter-efficient fine-tuning (PEFT) methods typically require separate training for each task, limiting their generalization to multi-task or compositional attribute control scenarios. This work proposes a plug-and-play approach for multi-attribute controllable text generation by directly summing the outputs of multiple independently trained QLoRA modules during inference, without any additional training. We systematically evaluate strategies including output combination, weight combination, and joint multi-dataset training across three large language models. Experimental results demonstrate that the proposed output summation method achieves an average 2-percentage-point improvement on sentiment control tasks and consistently matches or outperforms both single-task specialized modules and alternative composition strategies, significantly enhancing the capability for coordinated multi-attribute control.
📝 Abstract
Parameter-efficient fine-tuning (PEFT) techniques offer task-specific fine-tuning at a fraction of the cost of full fine-tuning, but require separate fine-tuning for every new task (combination). In this paper, we explore three ways of generalising beyond single-task training/inference: (i) training on combinations of multiple, related datasets; (ii) at inference, composing the weight matrices of separately trained PEFT modules; and (iii) at inference, composing the outputs of separately trained PEFT modules. We test these approaches on three different LLMs, QLoRA as the PEFT technique, and three sets of controlled text generation datasets for sentiment control, topic control, and multi-attribute control. We find that summing PEFT module outputs is a particularly strong composition method, which consistently either outperforms or matches the performance of alternative approaches. This is the case even when comparing against single-task specialised modules on the single-task test set, where three-module output composition achieves an average 2% point performance increase across all models for sentiment control.