🤖 AI Summary
This paper addresses the challenge of fine-grained political stance identification in news texts. Methodologically, it proposes a perspective classification framework integrating large language models (LLMs) with knowledge graphs, featuring a novel co-modeling mechanism that synergizes LLM long-context fine-tuning with Wikidata-based entity semantic enhancement, alongside hybrid human-AI annotation, semantically consistent perspective definitions, and a multi-module ensemble architecture. Key contributions include: (i) the first deep integration of Wikidata embeddings into LLM fine-tuning—significantly improving discrimination accuracy for ideological claims (e.g., stances on economic impacts of immigration); and (ii) an end-to-end classification pipeline supporting long-text understanding. Evaluated on a UK immigration discourse benchmark, the integrated approach achieves an F1 score of 89.6%, outperforming single-module baselines by 4.2 percentage points, thereby validating the efficacy of knowledge-guided LLM fine-tuning for political discourse analysis.
📝 Abstract
News sources play a central role in democratic societies by shaping political and social discourse through specific topics, viewpoints and voices. Understanding these dynamics is essential for assessing whether the media landscape offers a balanced and fair account of public debate. In earlier work, we introduced a pipeline that, given a news corpus, i) uses a hybrid human-machine approach to identify the range of viewpoints expressed about a given topic, and ii) classifies relevant claims with respect to the identified viewpoints, defined as sets of semantically and ideologically congruent claims (e.g., positions arguing that immigration positively impacts the UK economy). In this paper, we improve this pipeline by i) fine-tuning Large Language Models (LLMs) for viewpoint classification and ii) enriching claim representations with semantic descriptions of relevant actors drawn from Wikidata. We evaluate our approach against alternative solutions on a benchmark centred on the UK immigration debate. Results show that while both mechanisms independently improve classification performance, their integration yields the best results, particularly when using LLMs capable of processing long inputs.