Learning Parameter Sharing with Tensor Decompositions and Sparsity

📅 2024-11-14

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address the bottleneck of large parameter counts and high computational overhead when deploying large models on resource-constrained devices, this paper proposes Fine-grained Parameter Sharing (FiPS), the first framework unifying SVD-based initialization, block-wise reconstruction error optimization, and structured sparse factor sharing. FiPS enables cross-layer, learnable, neuron-level parameter reuse in MLP modules of both Vision Transformers (ViTs) and Large Language Models (LLMs). Leveraging low-rank tensor decomposition and a shared-basis-plus-sparse-factor representation, FiPS balances model expressivity and compression efficiency. It achieves 50–75% MLP parameter reduction on DeiT-B and Swin-L, with <1% Top-1 accuracy degradation; and 40–50% compression on Gemma-2 and Llama-3, preserving language modeling perplexity nearly losslessly. The method establishes a general, scalable lightweighting paradigm for efficient deployment of multimodal large models.

Technology Category

Application Category

📝 Abstract

Large neural networks exhibit exceptional performance across numerous tasks, yet their considerable size often hinders deployment on resource-constrained systems. While various model compression strategies have been well studied, parameter sharing remains underexplored. In this paper, we introduce Fine-grained Parameter Sharing (FiPS), a novel algorithm that leverages parameter sharing, tensor decomposition, and sparsity to effectively compress large-scale Vision Transformers (ViTs) and Large Language Models (LLMs). FiPS employs a shared base and sparse factors to represent neurons across multi-layer perceptron (MLP) modules, where initialization is guided by Singular Value Decomposition (SVD) and subsequent optimization is conducted through block-wise reconstruction error minimization. Experimental results show that FiPS reduces the parameter budget of MLP modules by 50-75% for DeiT-B and Swin-L and by 40-50% for various Gemma-2 and Llama-3 models while maintaining ViT model accuracy within 1% pt. of the original and LLM perplexity with negligible degradation.

Problem

Research questions and friction points this paper is trying to address.

Compress large neural networks efficiently

Enhance parameter sharing in MLP modules

Maintain model accuracy with reduced parameters

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained Parameter Sharing algorithm

Tensor decomposition and sparsity

Singular Value Decomposition initialization

🔎 Similar Papers

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection

2024-09-05arXiv.orgCitations: 0

💼 Related Jobs

Senior Software Engineer, AI Networking

Nvidia

The base salary range is 152,000 USD - 241,500 USD for Level 3, and 184,000 USD - 287,500 USD for Level 4.

US, CA, Santa Clara / US, WA, Seattle

Authors to Follow