In-Context Examples Matter: Improving Emotion Recognition in Conversation with Instruction Tuning

📅 2025-08-15

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing emotion recognition in conversations (ERC) approaches predominantly rely on multi-stage instruction tuning, which struggles to jointly model the dynamic interactions among speaker characteristics and contextual information, resulting in insufficient alignment among speaker identity, context, and emotion. To address this, we propose InitERC—a single-stage in-context instruction fine-tuning framework that unifies the modeling of speakers, context, and emotions for end-to-end alignment. Its core innovations include: (i) adaptive selection of contextual demonstrations from a curated demonstration pool; (ii) a structured prompt template explicitly encoding speaker–context–emotion relationships; and (iii) an in-context instruction fine-tuning mechanism that enables implicit learning of interaction patterns without parameter updates. Evaluated on three benchmark datasets—MELD, EmoryNLP, and IEMOCAP—InitERC consistently outperforms state-of-the-art methods, demonstrating the critical impact of both the quality and organizational structure of contextual demonstrations on ERC performance.

Technology Category

Application Category

📝 Abstract

Emotion recognition in conversation (ERC) aims to identify the emotion of each utterance in a conversation, playing a vital role in empathetic artificial intelligence. With the growing of large language models (LLMs), instruction tuning has emerged as a critical paradigm for ERC. Existing studies mainly focus on multi-stage instruction tuning, which first endows LLMs with speaker characteristics, and then conducts context-aware instruction tuning to comprehend emotional states. However, these methods inherently constrains the capacity to jointly capture the dynamic interaction between speaker characteristics and conversational context, resulting in weak alignment among speaker identity, contextual cues, and emotion states within a unified framework. In this paper, we propose InitERC, a simple yet effective one-stage in-context instruction tuning framework for ERC. InitERC adapts LLMs to learn speaker-context-emotion alignment from context examples via in-context instruction tuning. Specifically, InitERC comprises four components, i.e., demonstration pool construction, in-context example selection, prompt template design, and in-context instruction tuning. To explore the impact of in-context examples, we conduct a comprehensive study on three key factors: retrieval strategy, example ordering, and the number of examples. Extensive experiments on three widely used datasets demonstrate that our proposed InitERC achieves substantial improvements over the state-of-the-art baselines.

Problem

Research questions and friction points this paper is trying to address.

Improving emotion recognition in conversation via instruction tuning

Addressing weak alignment among speaker identity and context

Enhancing joint capture of dynamic speaker-context interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-stage in-context instruction tuning framework

Learns speaker-context-emotion alignment directly

Comprehensive study on in-context example factors

🔎 Similar Papers

Revise, Reason, and Recognize: LLM-Based Emotion Recognition via Emotion-Specific Prompts and ASR Error Correction

2024-09-23IEEE International Conference on Acoustics, Speech, and Signal ProcessingCitations: 1

AER-LLM: Ambiguity-aware Emotion Recognition Leveraging Large Language Models

2024-09-26arXiv.orgCitations: 0

Authors to Follow