GRASP: GRouped Activation Shared Parameterization for Parameter-Efficient Fine-Tuning and Robust Inference of Transformers

📅 2025-12-03

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

To address the challenges of large parameter counts and poor noise robustness in Parameter-Efficient Fine-Tuning (PEFT) for edge AI hardware deployment, this paper proposes GRASP—a lightweight PEFT framework that employs grouped token representation sharing to learn scalable/shift vectors, reducing trainable parameters by approximately 10× compared to LoRA and BitFit while preserving task adaptation capability. Furthermore, we introduce StochGRASP, the first PEFT method to explicitly incorporate hardware-level weight perturbation modeling: it applies Gaussian-weighted stochastic perturbations to adapter weights and employs a noise-aware loss for probabilistic weight modulation. On GLUE and E2E NLG benchmarks, GRASP matches or surpasses state-of-the-art PEFT methods in accuracy; StochGRASP demonstrates significantly enhanced robustness under multi-level hardware-induced noise, establishing a new paradigm for reliable PEFT deployment on resource-constrained edge devices.

Technology Category

Application Category

📝 Abstract

Parameter-efficient fine-tuning (PEFT) provides a scalable alternative to full-model adaptation by updating only a small subset of parameters in large pre-trained models. We introduce GRASP - GRouped Activation Shared Parameterization - a lightweight PEFT framework that partitions the D-dimensional token representations of selected layers into K << D groups and learns a shared scaling and shifting vector for each group. This grouped modulation reduces the number of trainable parameters significantly while preserving the ability of the model to learn task-specific features. Building on this formulation, we further propose StochGRASP, which learns Gaussian distributions as perturbations to the pre-trained weights rather than deterministic values. This probabilistic parameterization along with a noise-aware loss function formulation enables modelling hardware-level variability in programmed weights and significantly improves robustness under non-ideal inference conditions-an important requirement for deployment on edge-based emerging AI hardware. Across GLUE (RoBERTa-base & RoBERTa-large) and E2E NLG (GPT-2 Medium), GRASP matches or exceeds the performance of established PEFT methods while achieving an order of magnitude reduction in trainable parameters compared to LoRA and BitFit. Under varying levels of noise, StochGRASP consistently outperforms deterministic variants, demonstrating its suitability for energy-efficient and noise-prone hardware platforms.

Problem

Research questions and friction points this paper is trying to address.

Reduces trainable parameters for efficient fine-tuning

Improves robustness under noisy inference conditions

Enables deployment on energy-efficient edge hardware

Innovation

Methods, ideas, or system contributions that make the work stand out.

Grouped activation shared parameterization for efficient fine-tuning

Probabilistic parameterization with Gaussian distributions for robustness

Significantly reduces trainable parameters while maintaining performance

🔎 Similar Papers

Revisiting Prefix-tuning: Statistical Benefits of Reparameterization among Prompts