Sparse Activation Editing for Reliable Instruction Following in Narratives

📅 2025-05-22

📈 Citations: 0

✨ Influential: 0

career value

142K/year

🤖 AI Summary

Existing benchmarks inadequately expose the unreliability of language models in following instructions within complex narrative contexts. To address this, we propose Concise-SAE—a training-free, annotation-free, sparsely activated representation editing framework. It identifies instruction-relevant neurons via causal mediation analysis and enables zero-shot, targeted intervention directly through natural language instructions. We introduce FreeInstruct, the first narrative-oriented instruction-following benchmark (1,212 instances), and demonstrate that Concise-SAE significantly outperforms state-of-the-art methods: instruction adherence improves by up to 37.2% while preserving generation fluency and factual consistency. Moreover, it achieves superior generalization on non-narrative tasks. Our core contribution is the novel “instruction-driven sparse editing” paradigm—enabling efficient, interpretable, and training-free internal model intervention.

Technology Category

Application Category

📝 Abstract

Complex narrative contexts often challenge language models' ability to follow instructions, and existing benchmarks fail to capture these difficulties. To address this, we propose Concise-SAE, a training-free framework that improves instruction following by identifying and editing instruction-relevant neurons using only natural language instructions, without requiring labelled data. To thoroughly evaluate our method, we introduce FreeInstruct, a diverse and realistic benchmark of 1,212 examples that highlights the challenges of instruction following in narrative-rich settings. While initially motivated by complex narratives, Concise-SAE demonstrates state-of-the-art instruction adherence across varied tasks without compromising generation quality.

Problem

Research questions and friction points this paper is trying to address.

Improves instruction following in complex narratives

Edits instruction-relevant neurons without labeled data

Introduces benchmark for narrative-rich instruction challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free framework for instruction adherence

Edits instruction-relevant neurons naturally

State-of-the-art performance without quality loss

🔎 Similar Papers

No similar papers found.