Uncovering Latent Chain of Thought Vectors in Language Models

📅 2024-09-21
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) often lack controllable chain-of-thought (CoT) reasoning, undermining their trustworthiness in societally impactful applications. Method: We propose a prompt-free, fine-tuning-free latent-space intervention method that identifies and exploits linearly separable CoT directions within LLMs’ internal representations. By constructing task-specific steering vectors, we directly modulate hidden states during forward propagation. Contribution/Results: Evaluated on Llama3-8B and Mistral-7B-v0.2, our approach enables zero-shot, lightweight, and generalizable CoT control—matching standard CoT prompting performance on GSM8k, MMLU, AGIEval, and ARC, while achieving high reasoning consistency and significantly lower computational overhead than fine-tuning. Crucially, we uncover the intrinsic geometric structure of CoT representations and establish the first interpretable, intervention-based reasoning control paradigm grounded in latent-space directional manipulation.

Technology Category

Application Category

📝 Abstract
As language models grow more influential and trusted in our society, our ability to reliably steer them toward favorable behaviors becomes increasingly paramount. For this, we investigate the technique of steering vectors: biasing the forward pass of language models using a"steering vector"derived from a specific task. We apply them to steer language models toward performing Chain of Thought (CoT) Reasoning without the need to prompt through natural language. We demonstrate this approach on Llama3 8b and Mistral 7b v0.2, and obtain competitive results compared to CoT-prompted performances on a series of reasoning benchmarks (GSM8k, MMLU, AGI Eval, ARC AI2) and qualitative examples. We find this approach yields consistent steering towards CoT responses and takes less compute than traditional methods of fine-tuning models towards CoT.
Problem

Research questions and friction points this paper is trying to address.

Language Models
Chain-of-Thought Reasoning
Social Applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Guiding Vectors
Chain-of-Thought Reasoning
Computational Efficiency
🔎 Similar Papers