Focus On This, Not That! Steering LLMs With Adaptive Feature Specification

📅 2024-10-30
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) often exhibit unstable performance, amplified biases, and poor generalization in novel environments due to spurious correlations and biases embedded in training data. To address this, we propose Focus Instruction Tuning (FIT), the first instruction-tuning framework that incorporates feature-level attention mechanisms, enabling models to dynamically attend to task-relevant causal features—and suppress bias-inducing or spurious features—during inference, solely conditioned on the input prompt. FIT integrates causal feature identification with counterfactual feature masking, supporting zero-shot behavioral control, cross-distribution generalization, and explicit fairness constraints. Extensive experiments demonstrate that FIT significantly outperforms existing baselines in multi-task robustness, social bias mitigation, and out-of-distribution generalization. Crucially, it enables real-time, prompt-driven behavioral switching—achieving a principled balance among controllability, robustness, and fairness—without requiring parameter updates or task-specific fine-tuning.

Technology Category

Application Category

📝 Abstract
Despite the success of Instruction Tuning (IT) in training large language models (LLMs) to perform arbitrary user-specified tasks, these models often still leverage spurious or biased features learned from their training data, leading to undesired behaviours when deploying them in new contexts. In this work, we introduce Focus Instruction Tuning (FIT), which trains LLMs to condition their responses by focusing on specific features whilst ignoring others, leading to different behaviours based on what features are specified. Across several experimental settings, we show that focus-tuned models can be adaptively steered by focusing on different features at inference-time: for instance, robustness can be improved by focusing on task-causal features and ignoring spurious features, and social bias can be mitigated by ignoring demographic categories. Furthermore, FIT can steer behaviour in new contexts, generalising under distribution shift and to new unseen features at inference time, and thereby facilitating more robust, fair, and controllable LLM applications in real-world environments.
Problem

Research questions and friction points this paper is trying to address.

Large Models
Bias
Adaptability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Focused Instruction Tuning
Generalization Enhancement
Bias Reduction
🔎 Similar Papers
No similar papers found.