Picturized and Recited with Dialects: A Multimodal Chinese Representation Framework for Sentiment Analysis of Classical Chinese Poetry

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

Existing classical Chinese poetry sentiment analysis predominantly relies on textual semantics, neglecting prosodic (recitation audio) and visual (accompanying paintings) modalities. This paper proposes the first multimodal framework for classical poetry sentiment understanding that jointly models “sound,” “form,” and “meaning.” It introduces, for the first time, multi-dialect recitation audio to capture historical phonological affective cues; integrates generative Chinese painting–style visual representations with CLIP-style cross-modal encoding; and enhances classical Chinese textual representation via large language model–augmented translation. A novel Multimodal Contrastive Learning (MMCLR) strategy is designed to enable synergistic perception across audio, visual, and textual modalities. Evaluated on two public benchmarks, our method achieves ≥2.51% absolute accuracy gain and ≥1.63% macro-F1 improvement. The code is publicly released, establishing a new computational humanities paradigm for classical poetry sentiment analysis.

Technology Category

Application Category

📝 Abstract

Classical Chinese poetry is a vital and enduring part of Chinese literature, conveying profound emotional resonance. Existing studies analyze sentiment based on textual meanings, overlooking the unique rhythmic and visual features inherent in poetry,especially since it is often recited and accompanied by Chinese paintings. In this work, we propose a dialect-enhanced multimodal framework for classical Chinese poetry sentiment analysis. We extract sentence-level audio features from the poetry and incorporate audio from multiple dialects,which may retain regional ancient Chinese phonetic features, enriching the phonetic representation. Additionally, we generate sentence-level visual features, and the multimodal features are fused with textual features enhanced by LLM translation through multimodal contrastive representation learning. Our framework outperforms state-of-the-art methods on two public datasets, achieving at least 2.51% improvement in accuracy and 1.63% in macro F1. We open-source the code to facilitate research in this area and provide insights for general multimodal Chinese representation.

Problem

Research questions and friction points this paper is trying to address.

Analyzing sentiment in classical Chinese poetry using multimodal features

Incorporating dialect audio and visual elements for enriched representation

Improving accuracy in poetry sentiment analysis with LLM-enhanced fusion

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dialect-enhanced audio features for phonetic representation

Multimodal contrastive learning for feature fusion

LLM-enhanced textual features with visual inputs

🔎 Similar Papers

No similar papers found.