Scalable Data Attribution via Forward-Only Test-Time Inference

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This paper addresses the challenge of efficient data attribution in large-scale pre-trained models. We propose a forward-only attribution method that eliminates backward propagation by shifting computational overhead from the query phase to the training phase: short-horizon gradient propagation approximates how training samples influence model parameters, enabling attribution scores to be extracted via forward passes alone during inference. While preserving theoretical consistency with influence functions, our method achieves unprecedented inference-time efficiency—reducing overhead by several orders of magnitude compared to state-of-the-art approaches such as TRAK—without compromising accuracy. On standard MLP benchmarks, it matches or exceeds the attribution fidelity of existing SOTA methods. This work establishes a scalable, real-time, and high-fidelity paradigm for data influence assessment, with direct applications in model debugging, auditing, and data valuation.

Technology Category

Application Category

📝 Abstract

Data attribution seeks to trace model behavior back to the training examples that shaped it, enabling debugging, auditing, and data valuation at scale. Classical influence-function methods offer a principled foundation but remain impractical for modern networks because they require expensive backpropagation or Hessian inversion at inference. We propose a data attribution method that preserves the same first-order counterfactual target while eliminating per-query backward passes. Our approach simulates each training example's parameter influence through short-horizon gradient propagation during training and later reads out attributions for any query using only forward evaluations. This design shifts computation from inference to simulation, reflecting real deployment regimes where a model may serve billions of user queries but originate from a fixed, finite set of data sources (for example, a large language model trained on diverse corpora while compensating a specific publisher such as the New York Times). Empirically, on standard MLP benchmarks, our estimator matches or surpasses state-of-the-art baselines such as TRAK on standard attribution metrics (LOO and LDS) while offering orders-of-magnitude lower inference cost. By combining influence-function fidelity with first-order scalability, our method provides a theoretical framework for practical, real-time data attribution in large pretrained models.

Problem

Research questions and friction points this paper is trying to address.

Traces model behavior to training data sources

Eliminates expensive backpropagation during inference

Enables scalable data attribution for large models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Forward-only test-time inference for data attribution

Simulates parameter influence via short-horizon gradient propagation

Shifts computation from inference to training simulation

🔎 Similar Papers

Backdoor-based Explainable AI Benchmark for High Fidelity Evaluation of Attribution Methods