HyperINF: Unleashing the HyperPower of the Schulz's Method for Data Influence Estimation

📅 2024-10-07
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address the low estimation accuracy and high computational overhead of data influence functions in large-scale models, this paper proposes the first efficient influence estimation framework with strong convergence guarantees. Methodologically, it introduces hyperpower matrix inversion—exemplified by the Schulz iteration—to approximate influence functions, integrated with low-rank approximation of the generalized Fisher information matrix (GFIM) and LoRA-based fine-tuning architecture. This synergy enables constant-memory and constant-complexity computation. Theoretically, the framework ensures guaranteed convergence. Empirical evaluation on synthetic simulations and real-world attribution tasks—including label correction and fine-tuning data selection—demonstrates substantial improvements over state-of-the-art baselines: it maintains high accuracy under LoRA adaptation while reducing computational and memory costs from O(n) to O(1).

Technology Category

Application Category

📝 Abstract
Influence functions provide a principled method to assess the contribution of individual training samples to a specific target. Yet, their high computational costs limit their applications on large-scale models and datasets. Existing methods proposed for influence function approximation have significantly reduced the computational overheads. However, they mostly suffer from inaccurate estimation due to the lack of strong convergence guarantees from the algorithm. The family of hyperpower methods are well-known for their rigorous convergence guarantees on matrix inverse approximation, while the matrix multiplication operation can involve intractable memory and computation costs on large-scale models. We propose HyperINF, an efficient and accurate influence function approximation method which leverages the hyperpower method, specifically Schulz's iterative algorithm. To deal with the computation-intensive matrix multiplication, we incorporate the generalized fisher information (GFIM) as a low-rank approximation of the Hessian matrix, which reduces the memory and computation overheads to constant costs independent of ranks on LoRA-tuned models. We first demonstrate the superior accuracy and stability of method compared to other baselines through a synthetic convergence simulation for matrix inversion. We further validate the efficacy of method through extensive real-world data attribution tasks, including mislabeled data detection and data selection for LLM and VLM fine-tuning. On LoRA-tuned models, HyperINF achieves superior downstream performance with minimal memory and computational overhead, while other baselines suffer from significant degradation. Our codebase is available at https://github.com/Blackzxy/HyperINF.
Problem

Research questions and friction points this paper is trying to address.

High computational costs limit influence function applications
Existing methods lack accuracy due to weak convergence
Matrix multiplication is memory-intensive for large-scale models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Schulz's method for influence estimation
Incorporates GFIM for low-rank Hessian approximation
Reduces memory and computation to constant costs
🔎 Similar Papers
No similar papers found.
X
Xinyu Zhou
Machine Learning and Optimization Lab, EPFL
Simin Fan
Simin Fan
EPFL
OptimizationLLM
Martin Jaggi
Martin Jaggi
EPFL
Machine LearningOptimization