🤖 AI Summary
Large language models (LLMs) exhibit limited interpretability in affective analysis for AI-driven clinical decision-making. Method: We propose “affective reasoning”—a novel task jointly predicting emotion labels from speech/text inputs and generating human-readable justifications. We formally define cross-modal affective reasoning, construct the largest multilingual (5 languages), multimodal affective analysis dataset to date, and design a multimodal multi-task learning framework that jointly fine-tunes LLMs for emotion classification and rationale generation while modeling ASR transcription robustness. Results: Our approach achieves human-level semantic quality in generated rationales; emotion classification accuracy and macro-F1 both improve by 2%; no statistically significant performance gap exists between ASR-derived and human-generated transcriptions in reasoning quality. All code, data, and models are publicly released.
📝 Abstract
Transparency in AI healthcare decision-making is crucial. By incorporating rationales to explain reason for each predicted label, users could understand Large Language Models (LLMs)'s reasoning to make better decision. In this work, we introduce a new task - Sentiment Reasoning - for both speech and text modalities, and our proposed multimodal multitask framework and the world's largest multimodal sentiment analysis dataset. Sentiment Reasoning is an auxiliary task in sentiment analysis where the model predicts both the sentiment label and generates the rationale behind it based on the input transcript. Our study conducted on both human transcripts and Automatic Speech Recognition (ASR) transcripts shows that Sentiment Reasoning helps improve model transparency by providing rationale for model prediction with quality semantically comparable to humans while also improving model's classification performance (+2% increase in both accuracy and macro-F1) via rationale-augmented fine-tuning. Also, no significant difference in the semantic quality of generated rationales between human and ASR transcripts. All code, data (five languages - Vietnamese, English, Chinese, German, and French) and models are published online: https://github.com/leduckhai/Sentiment-Reasoning