Steering LLM Reasoning Through Bias-Only Adaptation

📅 2025-05-24

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work investigates whether the reasoning capabilities of large language models (LLMs) are inherently encoded in their pretrained weights, rather than being newly constructed via reinforcement learning (RL) fine-tuning. To this end, we propose a bias-only adaptation method that introduces layer-wise learnable steering vectors—without updating any original model parameters—to activate and amplify preexisting reasoning pathways. Evaluated on GSM8K and MATH, this approach restores—and in several cases surpasses—the performance of full-parameter RL fine-tuning across four base LLMs. Logit-lens analysis further demonstrates consistent enhancement of activations for logical connectives and tokens associated with structured linguistic representations. Our study provides the first empirical evidence supporting the hypothesis that core reasoning abilities are latent in pretrained weights. It also establishes the efficacy and mechanistic plausibility of lightweight, interpretable, parameter-free steering as a viable reasoning augmentation strategy.

Technology Category

Application Category

📝 Abstract

Recent work on reasoning-oriented language models, exemplified by o1-like systems, suggests that reinforcement-learning (RL) finetuning does not create new capabilities but instead strengthens reasoning patterns already latent in the pretrained network. We test this claim by training steering vectors: layer-wise biases that additively amplify selected hidden features while leaving all original weights unchanged. Experiments on four base models across the GSM8K and MATH benchmarks show that steering vectors recover, and in several cases exceed, the accuracy of fully-tuned counterparts. This result supports the view that the required reasoning skills pre-exist in the base model. Further, logit-lens analysis reveals that the trained vectors consistently boost token groups linked to structured languages and logical connectors, providing an interpretable account that aligns with the demands of quantitative reasoning tasks.

Problem

Research questions and friction points this paper is trying to address.

Investigates if bias-only adaptation can steer LLM reasoning effectively

Tests if pretrained models inherently possess required reasoning skills

Explores interpretable impact of steering vectors on logical token groups

Innovation

Methods, ideas, or system contributions that make the work stand out.

Steering vectors amplify hidden features

Bias-only adaptation boosts reasoning

Logit-lens analysis interprets token boosts

🔎 Similar Papers

Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories