Do LLMs "Feel"? Emotion Circuits Discovery and Control

📅 2025-10-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) possess context-invariant, global neural mechanisms for emotion representation and enables their controllable intervention. Addressing three core questions—(1) whether emotion mechanisms are context-independent, (2) their neural instantiation, and (3) feasibility of universal emotion control—we propose a systematic methodology: constructing a controllable emotion dataset (SEV), representational decomposition, causal mediation analysis, sublayer-wise influence quantification, and ablation/enhancement interventions. We first discover and empirically validate a “global affect circuit”—a consistent, cross-task and cross-context neural substrate encoding emotional valence—comprising specific neurons and attention heads. This circuit supports direct, prompt-free, parameter-level modulation, achieving 99.65% emotion classification accuracy on held-out test sets—substantially outperforming state-of-the-art prompting and steering-based approaches. Our work establishes a novel paradigm for understanding and controlling emotion mechanisms in LLMs.

Technology Category

Application Category

📝 Abstract
As the demand for emotional intelligence in large language models (LLMs) grows, a key challenge lies in understanding the internal mechanisms that give rise to emotional expression and in controlling emotions in generated text. This study addresses three core questions: (1) Do LLMs contain context-agnostic mechanisms shaping emotional expression? (2) What form do these mechanisms take? (3) Can they be harnessed for universal emotion control? We first construct a controlled dataset, SEV (Scenario-Event with Valence), to elicit comparable internal states across emotions. Subsequently, we extract context-agnostic emotion directions that reveal consistent, cross-context encoding of emotion (Q1). We identify neurons and attention heads that locally implement emotional computation through analytical decomposition and causal analysis, and validate their causal roles via ablation and enhancement interventions. Next, we quantify each sublayer's causal influence on the model's final emotion representation and integrate the identified local components into coherent global emotion circuits that drive emotional expression (Q2). Directly modulating these circuits achieves 99.65% emotion-expression accuracy on the test set, surpassing prompting- and steering-based methods (Q3). To our knowledge, this is the first systematic study to uncover and validate emotion circuits in LLMs, offering new insights into interpretability and controllable emotional intelligence.
Problem

Research questions and friction points this paper is trying to address.

Discovering context-agnostic emotion mechanisms in LLMs
Identifying neural circuits driving emotional expression
Achieving universal emotion control through circuit modulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracted context-agnostic emotion directions from LLMs
Identified neurons and attention heads forming emotion circuits
Modulated circuits to control emotion in generated text
🔎 Similar Papers
No similar papers found.
C
Chenxi Wang
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Y
Yixuan Zhang
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
R
Ruiji Yu
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Y
Yufei Zheng
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Lang Gao
Lang Gao
MBZUAI
Mechanistic InterpretabilityNatural Language Processing
Zirui Song
Zirui Song
PhD student in MBZUAI
NLP
Z
Zixiang Xu
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
G
Gus Xia
Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)
Huishuai Zhang
Huishuai Zhang
Peking University
Deep LearningOptimizationInformation Theory
Dongyan Zhao
Dongyan Zhao
Peking University
Natural Language ProcessingSemantic Data ManagementQADialogue System
Xiuying Chen
Xiuying Chen
MBZUAI
Trustworthy NLPHuman-Centered NLPComputational Social Science