GraSS: Scalable Influence Function with Sparse Gradient Compression

📅 2025-05-25

📈 Citations: 0

✨ Influential: 0

career value

205K/year

🤖 AI Summary

Influence functions suffer from prohibitive computational and memory overhead in large-scale models due to per-sample gradient computations. To address this, we propose FactGraSS—a novel, efficient compression framework that explicitly models and exploits the intrinsic sparsity of per-sample gradients for linear layers. FactGraSS integrates three key techniques: sparse gradient compression, low-rank matrix decomposition, and approximate Hessian-vector product computation—collectively achieving sublinear time and space complexity. Evaluated on billion-parameter models, FactGraSS improves throughput by 165% while preserving high influence fidelity. This work overcomes the scalability bottleneck of gradient-based attribution methods on ultra-large training datasets, enabling practical model diagnostics and interpretability analysis at unprecedented scale.

Technology Category

Application Category

📝 Abstract

Gradient-based data attribution methods, such as influence functions, are critical for understanding the impact of individual training samples without requiring repeated model retraining. However, their scalability is often limited by the high computational and memory costs associated with per-sample gradient computation. In this work, we propose GraSS, a novel gradient compression algorithm and its variants FactGraSS for linear layers specifically, that explicitly leverage the inherent sparsity of per-sample gradients to achieve sub-linear space and time complexity. Extensive experiments demonstrate the effectiveness of our approach, achieving substantial speedups while preserving data influence fidelity. In particular, FactGraSS achieves up to 165% faster throughput on billion-scale models compared to the previous state-of-the-art baselines. Our code is publicly available at https://github.com/TRAIS-Lab/GraSS.

Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost of per-sample gradient computation

Improving scalability of influence function methods

Enhancing efficiency in large-scale model training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages sparse gradient compression

Achieves sub-linear complexity

Enhances speed on large models

🔎 Similar Papers

Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection