From Instance Training to Instruction Learning: Task Adapters Generation from Instructions

📅 2024-06-18
🏛️ Neural Information Processing Systems
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Instruction fine-tuning (IFT) suffers from heavy reliance on large-scale annotated examples and poor few-shot cross-task generalization. To address this, we propose an instruction-driven zero-shot adapter generation framework. Our method introduces three key innovations: (1) the first end-to-end paradigm mapping natural-language instructions directly to adapter parameters; (2) a two-stage hypernetwork training scheme that decouples instruction understanding from parameter generation; and (3) the first integration of knowledge distillation into instruction learning to align instruction-level and instance-level training signals. Evaluated on Super-Natural Instructions and P3 benchmarks, our approach matches or surpasses state-of-the-art meta-trained and hypernetwork-based models in task performance, while significantly reducing inference computational overhead. This work establishes a new paradigm for efficient, low-resource generalization of large language models.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have acquired the ability to solve general tasks by utilizing instruction finetuning (IFT). However, IFT still relies heavily on instance training of extensive task data, which greatly limits the adaptability of LLMs to real-world scenarios where labeled task instances are scarce and broader task generalization becomes paramount. Contrary to LLMs, humans acquire skills and complete tasks not merely through repeated practice but also by understanding and following instructional guidelines. This paper is dedicated to simulating human learning to address the shortcomings of instance training, focusing on instruction learning to enhance cross-task generalization. Within this context, we introduce Task Adapters Generation from Instructions (TAGI), which automatically constructs the task-specific model in a parameter generation manner based on the given task instructions without retraining for unseen tasks. Specifically, we utilize knowledge distillation to enhance the consistency between TAGI developed through Learning with Instruction and task-specific models developed through Training with Instance, by aligning the labels, output logits, and adapter parameters between them. TAGI is endowed with cross-task generalization capabilities through a two-stage training process that includes hypernetwork pretraining and finetuning. We evaluate TAGI on the Super-Natural Instructions and P3 datasets. The experimental results demonstrate that TAGI can match or even outperform traditional meta-trained models and other hypernetwork models, while significantly reducing computational requirements.
Problem

Research questions and friction points this paper is trying to address.

Enhance LLMs' cross-task generalization
Reduce reliance on extensive task data
Automate task-specific model construction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Task Adapters Generation
Knowledge Distillation Enhancement
Hypernetwork Pretraining and Finetuning
🔎 Similar Papers
No similar papers found.
Huanxuan Liao
Huanxuan Liao
Institute of Automation, Chinese Academy of Sciences
Natural Language ProcessingLarge Language ModelLong Context Modeling
Y
Yao Xu
The Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
S
Shizhu He
The Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Yuanzhe Zhang
Yuanzhe Zhang
Institute of Automation, Chinese Academy of Sciences
Natural Language Processing
Y
Yanchao Hao
Platform and Content Group, Tencent, Beijing, China
S
Shengping Liu
Unisound, Beijing, China
K
Kang Liu
The Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
J
Jun Zhao
The Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China