FineSteer: A Unified Framework for Fine-Grained Inference-Time Steering in Large Language Models

📅 2026-04-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
Large language models are prone to safety violations and hallucinations during inference, and existing intervention methods struggle to simultaneously achieve effectiveness, utility preservation, and computational efficiency. This work proposes FineSteer, a framework that decouples inference-time intervention into two stages: conditional intervention and fine-grained vector composition. It introduces Subspace-guided Conditional Steering (SCS) to avoid ineffective interventions and a Mixture-of-Steering Experts (MoSE) mechanism to generate query-adaptive intervention vectors. Notably, FineSteer operates without updating model parameters and significantly outperforms current approaches on both safety and factuality benchmarks, while incurring negligible degradation on general-purpose tasks.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) often exhibit undesirable behaviors, such as safety violations and hallucinations. Although inference-time steering offers a cost-effective way to adjust model behavior without updating its parameters, existing methods often fail to be simultaneously effective, utility-preserving, and training-efficient due to their rigid, one-size-fits-all designs and limited adaptability. In this work, we present FineSteer, a novel steering framework that decomposes inference-time steering into two complementary stages: conditional steering and fine-grained vector synthesis, allowing fine-grained control over when and how to steer internal representations. In the first stage, we introduce a Subspace-guided Conditional Steering (SCS) mechanism that preserves model utility by avoiding unnecessary steering. In the second stage, we propose a Mixture-of-Steering-Experts (MoSE) mechanism that captures the multimodal nature of desired steering behaviors and generates query-specific steering vectors for improved effectiveness. Through tailored designs in both SCS and MoSE, FineSteer maintains robust performance on general queries while adaptively optimizing steering vectors for targeted inputs in a training-efficient manner. Extensive experiments on safety and truthfulness benchmarks show that FineSteer outperforms state-of-the-art methods in overall performance, achieving stronger steering performance with minimal utility loss. Code is available at https://github.com/YukinoAsuna/FineSteer
Problem

Research questions and friction points this paper is trying to address.

inference-time steering
large language models
safety violations
hallucinations
utility preservation
Innovation

Methods, ideas, or system contributions that make the work stand out.

inference-time steering
fine-grained control
Subspace-guided Conditional Steering
Mixture-of-Steering-Experts
large language models