From Syntax to Emotion: A Mechanistic Analysis of Emotion Inference in LLMs

📅 2026-04-28
📈 Citations: 0
Influential: 0
📄 PDF

career value

225K/year
🤖 AI Summary
The mechanisms underlying emotional representation formation in large language models (LLMs) remain poorly understood. This work presents the first systematic characterization of internal emotional representations in LLMs, leveraging sparse autoencoders and staged causal tracing to uncover a three-stage processing pipeline for emotion-related information. The analysis identifies both shared and emotion-specific features—such as notably stronger representations for disgust—and builds upon these insights to propose an interpretable, data-efficient method for emotional feature intervention. Evaluated across multiple emotion recognition benchmarks, the approach significantly enhances performance across diverse LLMs while preserving their original language modeling capabilities.
📝 Abstract
Large language models (LLMs) are increasingly used in emotionally sensitive human-AI applications, yet little is known about how emotion recognition is internally represented. In this work, we investigate the internal mechanisms of emotion recognition in LLMs using sparse autoencoders (SAEs). By analyzing sparse feature activations across layers, we identify a consistent three-phase information flow, in which emotion-related features emerge only in the final phase. We further show that emotion representations comprise both shared features across emotions and emotion-specific features. Using phase-stratified causal tracing, we identify a small set of features that strongly influence emotion predictions, and show that both their number and causal impact vary across emotions; in particular, Disgust is more weakly and diffusely represented than other emotions. Finally, we propose an interpretable and data-efficient causal feature steering method that significantly improves emotion recognition performance across multiple models while largely preserving language modeling ability, and demonstrate that these improvements generalize across multiple emotion recognition datasets. Overall, our findings provide a systematic analysis of the internal mechanisms underlying emotion recognition in LLMs and introduce an efficient, interpretable, and controllable approach for improving model performance.
Problem

Research questions and friction points this paper is trying to address.

emotion recognition
large language models
internal representation
emotion inference
mechanistic analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

sparse autoencoders
emotion recognition
causal tracing
feature steering
interpretable AI