π€ AI Summary
This work addresses a key challenge in efficient fine-tuning of large language models: selecting the most effective training samples under limited data to precisely steer target behaviors. The authors propose PRISM, a novel method that, for the first time, incorporates the modelβs current preference into the construction of target behavior representations. By leveraging preference-weighted influence functions, PRISM quantifies the alignment between each training sample and the desired behavior, enabling more accurate data selection. Integrated with first-order optimization analysis, this approach substantially enhances fine-tuning efficiency. Experimental results demonstrate that PRISM consistently improves supervised fine-tuning performance across diverse model architectures and scales, with particularly strong gains in safety-oriented alignment tasks.
π Abstract
As LLMs continue to scale, improving training efficiency increasingly depends on using data more effectively. Data selection addresses this problem by allocating a limited training budget to samples that best promote a target behavior. Existing methods usually represent the target behavior with a set of target examples, but often treat these examples as equally important. This can be inefficient because target examples may differ in their relevance to the current model: examples closer to the model's current behavior provide more actionable guidance than those farther away. We propose PRISM (PReference-aware Influence-function-based Data Selection Method for Efficient Fine-Tuning), which uses the current model's preference to weight target examples and construct a preference-aware target representation. PRISM then scores candidate training samples by their alignment with this representation, concentrating the data budget on samples more likely to move the model toward the target behavior. Theoretical analysis shows that this preference weighting yields a more effective first-order direction for increasing target-behavior preference. Experiments across model families and scales show that PRISM improves both efficient fine-tuning and safety-oriented SFT repair, demonstrating that precise target-behavior characterization is key to budget-efficient data selection.