Evolution of meta's llama models and parameter-efficient fine-tuning of large language models: a survey

📅 2025-10-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of efficient adaptation of large language models (LLMs). It systematically reviews the architectural evolution and adaptation techniques across Meta AI’s LLaMA series (v1–v4), covering foundational models, Mixture-of-Experts (MoE) variants, and multimodal extensions. To tackle LLM adaptation efficiency, it proposes the first unified, structured comparison of five mainstream Parameter-Efficient Fine-Tuning (PEFT) methods—LoRA, QLoRA, LLaMA-Adapter V1/V2, and LLaMA-Excitor—integrating quantization, low-rank decomposition, and adapter injection strategies across model scales from 7B to 288B parameters. Experiments demonstrate that updating only 0.1%–3% of parameters achieves performance on par with or exceeding full fine-tuning on instruction-following, multimodal understanding, and domain-specific tasks (e.g., healthcare, law). The core contribution is a comprehensive, generation-spanning analytical framework for LLaMA and PEFT, revealing mechanistic insights into high-performance transfer via minimal parameter updates—thereby providing both theoretical grounding and practical paradigms for lightweight LLM deployment.

Technology Category

Application Category

📝 Abstract
This review surveys the rapid evolution of Meta AI's LLaMA (Large Language Model Meta AI) series - from LLaMA 1 through LLaMA 4 and the specialized parameter-efficient fine-tuning (PEFT) methods developed for these models. We first describe the LLaMA family of foundation models (7B-65B to 288B parameters), their architectures (including native multimodal and Mixtureof-Experts variants), and key performance characteristics. We then describe and discuss the concept of PEFT, which adapts large pre-trained models by updating only a small subset of parameters, and review five PEFT methods that have been applied to LLaMA: LoRA (Low-Rank Adaptation), LLaMA-Adapter V1 and V2, LLaMA-Excitor, and QLoRA (Quantized LoRA). We discuss each method's mechanism, parameter savings, and example application to LLaMA (e.g., instruction tuning, multimodal tasks). We provide structured discussion and analysis of model and adapter architectures, parameter counts, and benchmark results (including examples where fine-tuned LLaMA models outperform larger baselines). Finally, we examine real-world use cases where LLaMA-based models and PEFT have been successfully applied (e.g., legal and medical domains), and we discuss ongoing challenges and future research directions (such as scaling to even larger contexts and improving robustness). This survey paper provides a one-stop resource for ML researchers and practitioners interested in LLaMA models and efficient fine-tuning strategies.
Problem

Research questions and friction points this paper is trying to address.

Surveying the evolution of Meta's LLaMA model series
Reviewing parameter-efficient fine-tuning methods for LLMs
Analyzing architectures, performance, and real-world applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Surveying LLaMA model evolution and parameter-efficient fine-tuning methods
Analyzing LoRA, QLoRA, and adapter architectures for efficient adaptation
Providing benchmark results and real-world applications of fine-tuned models
🔎 Similar Papers
No similar papers found.