"Yeah Right!"-- Do LLMs Exhibit Multimodal Feature Transfer?

📅 2025-01-07

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This study investigates large language models’ (LLMs) cross-domain capability to comprehend implicit deception in human dialogue—such as irony and sarcasm—and examines whether multimodal skill transfer enhances pragmatic awareness. Method: We propose a novel paradigm integrating speech-text joint pretraining with fine-tuning on human-to-human conversational data, and systematically evaluate both multimodal (speech+text) and unimodal (text-only) LLMs on zero-shot deceptive utterance detection. Contribution/Results: Experimental results demonstrate that either speech-text multimodal pretraining or human dialogue fine-tuning alone significantly improves implicit deception recognition; their combination yields further gains. Crucially, multimodal models surpass text-only baselines without additional prompting. This work provides the first empirical evidence that multimodal pretraining facilitates cross-modal intent representation learning and transferable implicit semantic understanding—offering a new pathway toward socially aware conversational AI.

Technology Category

Application Category

📝 Abstract

Human communication is a multifaceted and multimodal skill. Communication requires an understanding of both the surface-level textual content and the connotative intent of a piece of communication. In humans, learning to go beyond the surface level starts by learning communicative intent in speech. Once humans acquire these skills in spoken communication, they transfer those skills to written communication. In this paper, we assess the ability of speech+text models and text models trained with special emphasis on human-to-human conversations to make this multimodal transfer of skill. We specifically test these models on their ability to detect covert deceptive communication. We find that with no special prompting speech+text LLMs have an advantage over unimodal LLMs in performing this task. Likewise, we find that human-to-human conversation-trained LLMs are also advantaged in this skill.

Problem

Research questions and friction points this paper is trying to address.

Cross-domain Learning

Language Models

Deception Detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Learning

Cross-domain Social Skills

Lie Detection

🔎 Similar Papers

No similar papers found.