GraSS: Scalable Influence Function with Sparse Gradient Compression

📅 2025-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Influence functions suffer from prohibitive computational and memory overhead in large-scale models due to per-sample gradient computations. To address this, we propose FactGraSS—a novel, efficient compression framework that explicitly models and exploits the intrinsic sparsity of per-sample gradients for linear layers. FactGraSS integrates three key techniques: sparse gradient compression, low-rank matrix decomposition, and approximate Hessian-vector product computation—collectively achieving sublinear time and space complexity. Evaluated on billion-parameter models, FactGraSS improves throughput by 165% while preserving high influence fidelity. This work overcomes the scalability bottleneck of gradient-based attribution methods on ultra-large training datasets, enabling practical model diagnostics and interpretability analysis at unprecedented scale.

Technology Category

Application Category

📝 Abstract
Gradient-based data attribution methods, such as influence functions, are critical for understanding the impact of individual training samples without requiring repeated model retraining. However, their scalability is often limited by the high computational and memory costs associated with per-sample gradient computation. In this work, we propose GraSS, a novel gradient compression algorithm and its variants FactGraSS for linear layers specifically, that explicitly leverage the inherent sparsity of per-sample gradients to achieve sub-linear space and time complexity. Extensive experiments demonstrate the effectiveness of our approach, achieving substantial speedups while preserving data influence fidelity. In particular, FactGraSS achieves up to 165% faster throughput on billion-scale models compared to the previous state-of-the-art baselines. Our code is publicly available at https://github.com/TRAIS-Lab/GraSS.
Problem

Research questions and friction points this paper is trying to address.

Reducing computational cost of per-sample gradient computation
Improving scalability of influence function methods
Enhancing efficiency in large-scale model training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages sparse gradient compression
Achieves sub-linear complexity
Enhances speed on large models