Learning Parameter Sharing with Tensor Decompositions and Sparsity

πŸ“… 2024-11-14
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the bottleneck of large parameter counts and high computational overhead when deploying large models on resource-constrained devices, this paper proposes Fine-grained Parameter Sharing (FiPS), the first framework unifying SVD-based initialization, block-wise reconstruction error optimization, and structured sparse factor sharing. FiPS enables cross-layer, learnable, neuron-level parameter reuse in MLP modules of both Vision Transformers (ViTs) and Large Language Models (LLMs). Leveraging low-rank tensor decomposition and a shared-basis-plus-sparse-factor representation, FiPS balances model expressivity and compression efficiency. It achieves 50–75% MLP parameter reduction on DeiT-B and Swin-L, with <1% Top-1 accuracy degradation; and 40–50% compression on Gemma-2 and Llama-3, preserving language modeling perplexity nearly losslessly. The method establishes a general, scalable lightweighting paradigm for efficient deployment of multimodal large models.

Technology Category

Application Category

πŸ“ Abstract
Large neural networks exhibit exceptional performance across numerous tasks, yet their considerable size often hinders deployment on resource-constrained systems. While various model compression strategies have been well studied, parameter sharing remains underexplored. In this paper, we introduce Fine-grained Parameter Sharing (FiPS), a novel algorithm that leverages parameter sharing, tensor decomposition, and sparsity to effectively compress large-scale Vision Transformers (ViTs) and Large Language Models (LLMs). FiPS employs a shared base and sparse factors to represent neurons across multi-layer perceptron (MLP) modules, where initialization is guided by Singular Value Decomposition (SVD) and subsequent optimization is conducted through block-wise reconstruction error minimization. Experimental results show that FiPS reduces the parameter budget of MLP modules by 50-75% for DeiT-B and Swin-L and by 40-50% for various Gemma-2 and Llama-3 models while maintaining ViT model accuracy within 1% pt. of the original and LLM perplexity with negligible degradation.
Problem

Research questions and friction points this paper is trying to address.

Compress large neural networks efficiently
Enhance parameter sharing in MLP modules
Maintain model accuracy with reduced parameters
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained Parameter Sharing algorithm
Tensor decomposition and sparsity
Singular Value Decomposition initialization