Concept Drift Guided LayerNorm Tuning for Efficient Multimodal Metaphor Identification

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

To address the large literal-to-metaphorical semantic gap between visual and textual modalities in internet memes and the high computational cost of generative approaches, this paper proposes a lightweight and efficient multimodal metaphor recognition framework. Methodologically, it integrates CLIP-based cross-modal encoding, prompt-guided feature fusion, and a novel Spherical Linear Interpolation (SLERP)-based concept drift mechanism operating on CLIP embeddings to explicitly model the semantic evolution from literal to metaphorical interpretations. Additionally, it introduces an adapter-style LayerNorm fine-tuning strategy that optimizes only normalization layer parameters, drastically reducing training overhead. Evaluated on the MET-Meme benchmark, the method achieves state-of-the-art performance while reducing training FLOPs by an order of magnitude compared to mainstream generative models. Ablation studies confirm statistically significant improvements attributable to both innovations.

Technology Category

Application Category

📝 Abstract

Metaphorical imagination, the ability to connect seemingly unrelated concepts, is fundamental to human cognition and communication. While understanding linguistic metaphors has advanced significantly, grasping multimodal metaphors, such as those found in internet memes, presents unique challenges due to their unconventional expressions and implied meanings. Existing methods for multimodal metaphor identification often struggle to bridge the gap between literal and figurative interpretations. Additionally, generative approaches that utilize large language models or text-to-image models, while promising, suffer from high computational costs. This paper introduces extbf{C}oncept extbf{D}rift extbf{G}uided extbf{L}ayerNorm extbf{T}uning ( extbf{CDGLT}), a novel and training-efficient framework for multimodal metaphor identification. CDGLT incorporates two key innovations: (1) Concept Drift, a mechanism that leverages Spherical Linear Interpolation (SLERP) of cross-modal embeddings from a CLIP encoder to generate a new, divergent concept embedding. This drifted concept helps to alleviate the gap between literal features and the figurative task. (2) A prompt construction strategy, that adapts the method of feature extraction and fusion using pre-trained language models for the multimodal metaphor identification task. CDGLT achieves state-of-the-art performance on the MET-Meme benchmark while significantly reducing training costs compared to existing generative methods. Ablation studies demonstrate the effectiveness of both Concept Drift and our adapted LN Tuning approach. Our method represents a significant step towards efficient and accurate multimodal metaphor understanding. The code is available: href{https://github.com/Qianvenh/CDGLT}{https://github.com/Qianvenh/CDGLT}.

Problem

Research questions and friction points this paper is trying to address.

Bridging literal and figurative gaps in multimodal metaphor identification

Reducing high computational costs in generative metaphor detection methods

Improving efficiency and accuracy in understanding unconventional multimodal metaphors

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concept Drift with SLERP for cross-modal embeddings

Prompt construction for feature extraction and fusion

LayerNorm Tuning to reduce training costs

🔎 Similar Papers

Meta4XNLI: A Crosslingual Parallel Corpus for Metaphor Detection and Interpretation