🤖 AI Summary
To address the weak personalization adaptability and low data efficiency of large language models (LLMs) under dynamic user preferences and high data sparsity, this paper proposes a fine-grained, instance-level inference-time steering framework. The method hooks internal attention and MLP layer activations, then employs an input-aware signal aggregation mechanism to dynamically generate sample-specific, non-parametric intervention vectors injected into the forward pass—enabling efficient, context-sensitive personalization. Its core contributions are: (1) fine-grained inter-layer activation modeling; (2) input-driven adaptive aggregation; and (3) orthogonal compatibility with existing methods, enabling plug-and-play integration. Experiments across diverse tasks—including short/long text generation and web function calling—demonstrate significant improvements in personalized performance. Notably, the framework maintains robustness and generalization under rapid user distribution shifts and heterogeneous interaction patterns, even with limited user feedback.
📝 Abstract
The rapid evolution of large language models (LLMs) has intensified the demand for effective personalization techniques that can adapt model behavior to individual user preferences. Despite the non-parametric methods utilizing the in-context learning ability of LLMs, recent parametric adaptation methods, including personalized parameter-efficient fine-tuning and reward modeling emerge. However, these methods face limitations in handling dynamic user patterns and high data sparsity scenarios, due to low adaptability and data efficiency. To address these challenges, we propose a fine-grained and instance-tailored steering framework that dynamically generates sample-level interference vectors from user data and injects them into the model's forward pass for personalized adaptation. Our approach introduces two key technical innovations: a fine-grained steering component that captures nuanced signals by hooking activations from attention and MLP layers, and an input-aware aggregation module that synthesizes these signals into contextually relevant enhancements. The method demonstrates high flexibility and data efficiency, excelling in fast-changing distribution and high data sparsity scenarios. In addition, the proposed method is orthogonal to existing methods and operates as a plug-in component compatible with different personalization techniques. Extensive experiments across diverse scenarios--including short-to-long text generation, and web function calling--validate the effectiveness and compatibility of our approach. Results show that our method significantly enhances personalization performance in fast-shifting environments while maintaining robustness across varying interaction modes and context lengths. Implementation is available at https://github.com/KounianhuaDu/Fints.