An empirical study of LLaMA3 quantization: from LLMs to MLLMs

πŸ“… 2024-04-22
πŸ›οΈ Vis. Intell.
πŸ“ˆ Citations: 20
✨ Influential: 4
πŸ“„ PDF
πŸ€– AI Summary
LLaMA3 and its multimodal extensions (MLLMs) suffer significant performance degradation under ultra-low-bit quantization (e.g., INT4/FP8), hindering efficient deployment. Method: We conduct a unified empirical evaluation across quantization methods (AWQ, GPTQ, FP8), tasks, and hardware platforms. We propose a novel quantization robustness evaluation paradigm tailored to the full LLaMA3 family, integrating per-tensor/per-channel weight compression with activation calibration, and validate its generalizability on vision-language models (VLMs) such as LLaVA-NeXT. Contribution/Results: Our approach achieves only a 1.2% average accuracy drop across mainstream benchmarks, while accelerating inference by 2.1Γ— and reducing GPU memory footprint by 65%. These results substantially improve deployability in resource-constrained environments and establish a best-practice pathway for production-ready quantization of LLaMA3 and related MLLMs.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

LLaMA3 Model
Low-bit Quantization
Performance Degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-bit Quantization
LLaMA3 Optimization
Performance Preservation
πŸ”Ž Similar Papers
No similar papers found.
W
Wei Huang
Department of Electrical and Electronic Engineering, The University of Hong Kong, Pokfulam Road, Hong Kong, 999077, China
X
Xingyu Zheng
School of Computer Science and Engineering, Beihang University, Xueyuan Road, Beijing, 100191, China
X
Xudong Ma
School of Computer Science and Engineering, Beihang University, Xueyuan Road, Beijing, 100191, China
Haotong Qin
Haotong Qin
ETH ZΓΌrich
TinyMLModel CompressionComputer VisionDeep Learning
Chengtao Lv
Chengtao Lv
Nanyang Technological University
Efficient AI
H
Hong Chen
School of Computer Science and Engineering, Beihang University, Xueyuan Road, Beijing, 100191, China
J
Jie Luo
School of Computer Science and Engineering, Beihang University, Xueyuan Road, Beijing, 100191, China
Xiaojuan Qi
Xiaojuan Qi
Assistant Professor, The University of Hong Kong
3D VisionDeep learningArtificial IntelligenceMedical Image Analysis
X
Xianglong Liu
School of Computer Science and Engineering, Beihang University, Xueyuan Road, Beijing, 100191, China
Michele Magno
Michele Magno
ETH Zurich
Wireless sensor networksSmart Sensors and Internet of ThingsWake up RadioPower managementEnergy harvesters