Bitune: Bidirectional Instruction-Tuning

📅 2024-05-23
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Decoder-only large language models suffer from limited representational capacity due to their exclusive reliance on unidirectional causal attention. To address this, we propose the first instruction-tuning method specifically designed for pure decoder architectures: during the prompt encoding phase, we introduce parallel causal and bidirectional attention pathways with parameter separation; their outputs are dynamically fused via learnable weights to guide autoregressive generation. Our approach is architecture-agnostic and does not depend on any specific parameter-efficient fine-tuning (PEFT) technique. Experiments demonstrate consistent zero-shot performance gains over baselines across commonsense reasoning, arithmetic, and linguistic understanding tasks. Ablation studies confirm the effectiveness and necessity of incorporating bidirectional attention, the dual-path design, and the learnable fusion mechanism.

Technology Category

Application Category

📝 Abstract
We introduce Bitune, a method that improves instruction-tuning of pretrained decoder-only large language models, leading to consistent gains on downstream tasks. Bitune applies both causal and bidirectional attention to the prompt, to obtain a better representation of the query or instruction. We realize this by introducing two sets of parameters, for which we apply parameter-efficient finetuning techniques. These causal and bidirectional features are then combined into a weighted average with trainable coefficients, which is subsequently used to generate new tokens. We demonstrate significant improvements in zero-shot performance on commonsense reasoning, arithmetic, and language understanding tasks, while extensive ablation studies validate the role of each component and demonstrate the method's agnosticism to different PEFT techniques.
Problem

Research questions and friction points this paper is trying to address.

Enhancing decoder-only LLMs with bidirectional attention
Improving performance on reasoning and understanding tasks
Enabling bidirectional information flow in prompt processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates bidirectional attention into decoder-only models
Enhances performance on reasoning and understanding tasks
Compatible with parameter-efficient and full finetuning techniques
🔎 Similar Papers
No similar papers found.
D
Dawid J. Kopiczko
Vrije Universiteit Amsterdam
Tijmen Blankevoort
Tijmen Blankevoort
Meta - GenAI Llama Foundation Models
Machine LearningDeep Learning
Y
Yuki Markus Asano
University of Amsterdam