SelectIT: Selective Instruction Tuning for LLMs via Uncertainty-Aware Self-Reflection

📅 2024-02-26
📈 Citations: 10
Influential: 0
📄 PDF
🤖 AI Summary
High-quality data selection for instruction tuning (IT) is costly, often requiring external models or human annotations. Method: We propose SelectIT, an intrinsic uncertainty-aware self-reflection data selection mechanism leveraging large language models’ (LLMs) own output entropy and confidence estimates—without auxiliary models, labels, or additional training. Through self-reflective prompting and dynamic quality assessment, SelectIT enables zero-cost, iterative refinement of instruction data. Contribution/Results: We introduce Selective Alpaca—the first high-quality subset (10% of Alpaca-GPT4) optimized via SelectIT—which consistently improves instruction-following performance across diverse base models (Llama-2/3, Qwen) and cross-domain tasks, even surpassing full-data fine-tuning baselines. This work establishes the first LLM uncertainty-driven, self-supervised data selection paradigm, offering a computationally efficient and cost-effective pathway to instruction alignment.

Technology Category

Application Category

📝 Abstract
Instruction tuning (IT) is crucial to tailoring large language models (LLMs) towards human-centric interactions. Recent advancements have shown that the careful selection of a small, high-quality subset of IT data can significantly enhance the performance of LLMs. Despite this, common approaches often rely on additional models or data, which increases costs and limits widespread adoption. In this work, we propose a novel approach, termed SelectIT, that capitalizes on the foundational capabilities of the LLM itself. Specifically, we exploit the intrinsic uncertainty present in LLMs to more effectively select high-quality IT data, without the need for extra resources. Furthermore, we introduce a curated IT dataset, the Selective Alpaca, created by applying SelectIT to the Alpaca-GPT4 dataset. Empirical results demonstrate that IT using Selective Alpaca leads to substantial model ability enhancement. The robustness of SelectIT has also been corroborated in various foundation models and domain-specific tasks. Our findings suggest that longer and more computationally intensive IT data may serve as superior sources of IT, offering valuable insights for future research in this area. Data, code, and scripts are freely available at https://github.com/Blue-Raincoat/SelectIT.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Uncertainty Calibration
Efficient Instruction Tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

SelectIT
Self-inspection Uncertainty
Selective Refinement
🔎 Similar Papers
No similar papers found.
L
Liangxin Liu
Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China
X
Xuebo Liu
Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China
Derek F. Wong
Derek F. Wong
Professor, Department of Computer and Information Science, University of Macau
Machine TranslationNeural Machine TranslationNatural Language ProcessingMachine Learning
D
Dongfang Li
Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China
Z
Ziyi Wang
Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China
Baotian Hu
Baotian Hu
Harbin Institute of Technology (Shenzhen)
LLMMLLMNLP
M
Min Zhang
Institute of Computing and Intelligence, Harbin Institute of Technology, Shenzhen, China