Scalable Data Attribution via Forward-Only Test-Time Inference

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of efficient data attribution in large-scale pre-trained models. We propose a forward-only attribution method that eliminates backward propagation by shifting computational overhead from the query phase to the training phase: short-horizon gradient propagation approximates how training samples influence model parameters, enabling attribution scores to be extracted via forward passes alone during inference. While preserving theoretical consistency with influence functions, our method achieves unprecedented inference-time efficiency—reducing overhead by several orders of magnitude compared to state-of-the-art approaches such as TRAK—without compromising accuracy. On standard MLP benchmarks, it matches or exceeds the attribution fidelity of existing SOTA methods. This work establishes a scalable, real-time, and high-fidelity paradigm for data influence assessment, with direct applications in model debugging, auditing, and data valuation.

Technology Category

Application Category

📝 Abstract
Data attribution seeks to trace model behavior back to the training examples that shaped it, enabling debugging, auditing, and data valuation at scale. Classical influence-function methods offer a principled foundation but remain impractical for modern networks because they require expensive backpropagation or Hessian inversion at inference. We propose a data attribution method that preserves the same first-order counterfactual target while eliminating per-query backward passes. Our approach simulates each training example's parameter influence through short-horizon gradient propagation during training and later reads out attributions for any query using only forward evaluations. This design shifts computation from inference to simulation, reflecting real deployment regimes where a model may serve billions of user queries but originate from a fixed, finite set of data sources (for example, a large language model trained on diverse corpora while compensating a specific publisher such as the New York Times). Empirically, on standard MLP benchmarks, our estimator matches or surpasses state-of-the-art baselines such as TRAK on standard attribution metrics (LOO and LDS) while offering orders-of-magnitude lower inference cost. By combining influence-function fidelity with first-order scalability, our method provides a theoretical framework for practical, real-time data attribution in large pretrained models.
Problem

Research questions and friction points this paper is trying to address.

Traces model behavior to training data sources
Eliminates expensive backpropagation during inference
Enables scalable data attribution for large models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Forward-only test-time inference for data attribution
Simulates parameter influence via short-horizon gradient propagation
Shifts computation from inference to training simulation
🔎 Similar Papers
No similar papers found.