Bitune: Bidirectional Instruction-Tuning

📅 2024-05-23

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

181K/year

🤖 AI Summary

Decoder-only large language models suffer from limited representational capacity due to their exclusive reliance on unidirectional causal attention. To address this, we propose the first instruction-tuning method specifically designed for pure decoder architectures: during the prompt encoding phase, we introduce parallel causal and bidirectional attention pathways with parameter separation; their outputs are dynamically fused via learnable weights to guide autoregressive generation. Our approach is architecture-agnostic and does not depend on any specific parameter-efficient fine-tuning (PEFT) technique. Experiments demonstrate consistent zero-shot performance gains over baselines across commonsense reasoning, arithmetic, and linguistic understanding tasks. Ablation studies confirm the effectiveness and necessity of incorporating bidirectional attention, the dual-path design, and the learnable fusion mechanism.

Technology Category

Application Category

📝 Abstract

We introduce Bitune, a method that improves instruction-tuning of pretrained decoder-only large language models, leading to consistent gains on downstream tasks. Bitune applies both causal and bidirectional attention to the prompt, to obtain a better representation of the query or instruction. We realize this by introducing two sets of parameters, for which we apply parameter-efficient finetuning techniques. These causal and bidirectional features are then combined into a weighted average with trainable coefficients, which is subsequently used to generate new tokens. We demonstrate significant improvements in zero-shot performance on commonsense reasoning, arithmetic, and language understanding tasks, while extensive ablation studies validate the role of each component and demonstrate the method's agnosticism to different PEFT techniques.

Problem

Research questions and friction points this paper is trying to address.

Enhancing decoder-only LLMs with bidirectional attention

Improving performance on reasoning and understanding tasks

Enabling bidirectional information flow in prompt processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates bidirectional attention into decoder-only models

Enhances performance on reasoning and understanding tasks

Compatible with parameter-efficient and full finetuning techniques

🔎 Similar Papers

No similar papers found.