ECAI 2024 - 27th European Conference on Artificial Intelligence, 19-24 October 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024)

📅 2024-10-24

🏛️ European Conference on Artificial Intelligence

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

High deployment costs and substantial performance degradation after lightweighting hinder the practical adoption of large language models (LLaMA). To address this, we propose Tailored LLaMA—a dual-driven efficient adaptation framework integrating structural pruning with task-customized prompting. Our approach innovatively unifies task-constraint-aware pruning, LoRA-based low-rank fine-tuning, and few-shot prompt engineering. On the 5B-parameter LLaMA model, Tailored LLaMA achieves 20%–50% parameter compression with fine-tuning time under one hour. It attains classification accuracies of 95.68% (20% compression) and 86.54% (50% compression), retaining over 65% of baseline performance at 50% compression—significantly outperforming existing lightweighting methods. This work breaks the long-standing bottleneck in joint pruning-fine-tuning optimization, establishing a new state-of-the-art in efficient LLM adaptation.

Technology Category

Application Category

📝 Abstract

Large language models demonstrate impressive proficiency in language understanding and generation. Nonetheless, training these models from scratch, even the least complex billion-parameter variant demands significant computational resources rendering it economically impractical for many organizations. With large language models functioning as general-purpose task solvers, this paper investigates their task-specific fine-tuning. We employ task-specific datasets and prompts to fine-tune two pruned LLaMA models having 5 billion and 4 billion parameters. This process utilizes the pre-trained weights and focuses on a subset of weights using the LoRA method. One challenge in fine-tuning the LLaMA model is crafting a precise prompt tailored to the specific task. To address this, we propose a novel approach to fine-tune the LLaMA model under two primary constraints: task specificity and prompt effectiveness. Our approach, Tailored LLaMA initially employs structural pruning to reduce the model sizes from 7B to 5B and 4B parameters. Subsequently, it applies a carefully designed prompt specific to the task and utilizes the LoRA method to accelerate the fine-tuning process. Moreover, fine-tuning a model pruned by 50% for less than one hour restores the mean accuracy of classification tasks to 95.68% at a 20% compression ratio and to 86.54% at a 50% compression ratio through few-shot learning with 50 shots. Our validation of Tailored LLaMA on these two pruned variants demonstrates that even when compressed to 50%, the models maintain over 65% of the baseline model accuracy in few-shot classification and generation tasks. These findings highlight the efficacy of our tailored approach in maintaining high performance with significantly reduced model sizes.

Problem

Research questions and friction points this paper is trying to address.

Parameter Reduction

Large Language Models

Instruction Understanding and Few-shot Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter Reduction

Model Optimization

Efficient Learning

🔎 Similar Papers

No similar papers found.