🤖 AI Summary
Current LLM-based dialogue systems exhibit passive responsiveness in affective, goal-oriented scenarios (e.g., marketing), struggling to simultaneously maintain emotional consistency and persuasive efficacy. To address this, we propose an Affective Multimodal Dialogue Agent that pioneers emotion-driven proactivity in marketing conversations. Our method introduces: (1) an Emotion-Intent Alignment Model and a Reinforced Utterance Loop mechanism; (2) multimodal affect recognition integrating textual, visual, and prosodic cues; (3) an Active Knowledge Association Network; and (4) a user-feedback-driven reinforcement learning framework for joint policy optimization. Evaluated on MM-ConvMarket and AffectPromo, our agent achieves a 26% improvement in emotional consistency, a 19% increase in persuasion success rate, and a 23% gain in long-term user engagement—marking the first demonstration of dynamic, closed-loop co-adaptation between emotion perception and persuasive strategy execution.
📝 Abstract
Recent advances in large language models (LLMs) have enabled fluent dialogue systems, but most remain reactive and struggle in emotionally rich, goal-oriented settings such as marketing conversations. To address this limitation, we propose AffectMind, a multimodal affective dialogue agent that performs proactive reasoning and dynamic knowledge grounding to sustain emotionally aligned and persuasive interactions. AffectMind combines three components: a Proactive Knowledge Grounding Network (PKGN) that continuously updates factual and affective context from text, vision, and prosody; an Emotion--Intent Alignment Model (EIAM) that jointly models user emotion and purchase intent to adapt persuasion strategies; and a Reinforced Discourse Loop (RDL) that optimizes emotional coherence and engagement via reinforcement signals from user responses. Experiments on two newly curated marketing dialogue datasets, MM-ConvMarket and AffectPromo, show that AffectMind outperforms strong LLM-based baselines in emotional consistency (+26%), persuasive success rate (+19%), and long-term user engagement (+23%), highlighting emotion-grounded proactivity as a key capability for commercial multimodal agents.