🤖 AI Summary
This work addresses the strategic vulnerability of human-preference-aligned language models in adversarial negotiations, where they are prone to yielding to opponents under emotionally charged language. To mitigate this, the authors propose EmoDistill, a novel framework that treats emotion not merely as stylistic surface variation but as a strategic channel of action. EmoDistill decomposes emotional negotiation skills into two stages via offline distillation: emotion selection and emotion expression. The former integrates GoEmotions-based prompting with Implicit Q-Learning (IQL), while the latter employs LoRA fine-tuning combining Supervised Fine-Tuning (SFT) and Judge Policy Optimization (JPO). Experiments across four high-stakes negotiation scenarios demonstrate that EmoDistill significantly outperforms baseline methods. Ablation studies confirm the critical role of emotion conditioning, and the approach exhibits strong generalization across domains and opponent types.
📝 Abstract
Post-trained LLMs are often optimized to align responses with human preferences, making them safe, polite, and conversationally appropriate. In adversarial negotiation, however, this alignment can become a vulnerability: emotionally framed language may steer agents toward the counterparty's interests. Using GoEmotions-based affective prompting, we show that emotion substantially shifts negotiation outcomes, suggesting that emotion is a strategic action channel rather than a surface style. Thus, we introduce \textbf{EmoDistill}, an offline framework for distilling emotional negotiation skills into language model agents. EmoDistill decomposes emotional strategy into emotion selection and emotion expression: an Implicit Q-Learning (IQL) selector learns \emph{which} emotion to express, while a Low-Rank Adaptation (LoRA)-based policy learns \emph{how} to express it through Supervised Fine-Tuning (SFT) and Judge Policy Optimization (JPO). Across four emotion-sensitive, high-stakes negotiation domains, SLM policies trained under the EmoDistill framework achieve the highest utility, outperforming vanilla SLM/LLM baselines and IQL-only emotion selection. Ablations show that emotion conditioning is essential, and transfer studies demonstrate generalization across domains, unseen counterparties, and trained-vs-trained tournaments. Overall, EmoDistill learns skills from offline agent-to-agent interactions, avoiding costly online negotiation during training.