Infusion: Shaping Model Behavior by Editing Training Data via Influence Functions

๐Ÿ“… 2026-02-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work proposes a novel method to steer model behavior through minimal perturbations to training data without explicitly inserting new examples. Leveraging a scalable approximation of influence functions, the approach inversely constructs minimal edits to training samples that induce targeted shifts in model parameters. For the first time, influence functions are employed for the inverse design of training data, demonstrating that modifying as little as 0.2% of samplesโ€”such as 100 images in CIFAR-10โ€”is sufficient to effectively manipulate model behavior. The method successfully amplifies existing capabilities in language tasks and exhibits cross-architecture transferability. Experimental results validate its efficacy while also revealing inherent limitations, highlighting both the potential and constraints of influence-based data editing for model control.

Technology Category

Application Category

๐Ÿ“ Abstract
Influence functions are commonly used to attribute model behavior to training documents. We explore the reverse: crafting training data that induces model behavior. Our framework, Infusion, uses scalable influence-function approximations to compute small perturbations to training documents that induce targeted changes in model behavior through parameter shifts. We evaluate Infusion on data poisoning tasks across vision and language domains. On CIFAR-10, we show that making subtle edits via Infusion to just 0.2% (100/45,000) of the training documents can be competitive with the baseline of inserting a small number of explicit behavior examples. We also find that Infusion transfers across architectures (ResNet $\leftrightarrow$ CNN), suggesting a single poisoned corpus can affect multiple independently trained models. In preliminary language experiments, we characterize when our approach increases the probability of target behaviors and when it fails, finding it most effective at amplifying behaviors the model has already learned. Taken together, these results show that small, subtle edits to training data can systematically shape model behavior, underscoring the importance of training data interpretability for adversaries and defenders alike. We provide the code here: https://github.com/jrosseruk/infusion.
Problem

Research questions and friction points this paper is trying to address.

influence functions
training data editing
model behavior shaping
data poisoning
adversarial machine learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

influence functions
training data editing
data poisoning
model behavior shaping
cross-architecture transfer
๐Ÿ”Ž Similar Papers
No similar papers found.