A Dataset for Analysing News Framing in Chinese Media

📅 2025-03-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing automated news framing detection research lacks a Chinese benchmark dataset, and the lexical complexity and linguistic uniqueness of Chinese hinder cross-lingual framing analysis. Method: We introduce CN-Frame, the first expert-annotated news framing dataset for Chinese media, covering mainstream Chinese news texts and curated under rigorous quality control; it supports both standalone evaluation and extension of SemEval-2023 Task 3. Contribution/Results: We evaluate two baselines—fine-tuned XLM-RoBERTa-Base and zero-shot GPT-4o—using micro-F1. Fine-tuning solely on CN-Frame achieves 0.719; incorporating SemEval-2023 data further improves performance to 0.753—significantly surpassing GPT-4o’s zero-shot score of 0.621. These results validate CN-Frame’s effectiveness and practical utility for multilingual framing identification, addressing a critical gap in Chinese NLP resources.

Technology Category

Application Category

📝 Abstract
Framing is an essential device in news reporting, allowing the writer to influence public perceptions of current affairs. While there are existing automatic news framing detection datasets in various languages, none of them focus on news framing in the Chinese language which has complex character meanings and unique linguistic features. This study introduces the first Chinese News Framing dataset, to be used as either a stand-alone dataset or a supplementary resource to the SemEval-2023 task 3 dataset. We detail its creation and we run baseline experiments to highlight the need for such a dataset and create benchmarks for future research, providing results obtained through fine-tuning XLM-RoBERTa-Base and using GPT-4o in the zero-shot setting. We find that GPT-4o performs significantly worse than fine-tuned XLM-RoBERTa across all languages. For the Chinese language, we obtain an F1-micro (the performance metric for SemEval task 3, subtask 2) score of 0.719 using only samples from our Chinese News Framing dataset and a score of 0.753 when we augment the SemEval dataset with Chinese news framing samples. With positive news frame detection results, this dataset is a valuable resource for detecting news frames in the Chinese language and is a valuable supplement to the SemEval-2023 task 3 dataset.
Problem

Research questions and friction points this paper is trying to address.

Lack of Chinese news framing datasets for analysis.
Need for benchmarks in Chinese news framing detection.
Improving detection accuracy using fine-tuned models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

First Chinese News Framing dataset created
Fine-tuned XLM-RoBERTa outperforms GPT-4o
Augmented SemEval dataset improves F1 score
🔎 Similar Papers
No similar papers found.