🤖 AI Summary
Existing political text classification methods exhibit limited out-of-distribution (OOD) generalization. To address this, we propose the first unified framework that jointly models both political orientation (left/center/right) and politicalness intensity (strong/weak). Methodologically, we design a Transformer-based architecture that integrates multi-source supervision signals, employs leave-one-in/leave-one-out cross-domain evaluation, and leverages large-scale pretraining with domain-adaptive transfer learning. We systematically construct the first large-scale, multi-source annotated dataset for politicalness—comprising 18 diverse datasets—and curate 12 political orientation datasets. Experimental results demonstrate substantial OOD robustness improvements: our framework achieves an average accuracy gain of 12.3% across multiple cross-domain settings. This work is the first to empirically validate the efficacy of joint modeling of political orientation and politicalness, establishing a scalable, highly adaptive paradigm for political text analysis.
📝 Abstract
This paper addresses the challenge of automatically classifying text according to political leaning and politicalness using transformer models. We compose a comprehensive overview of existing datasets and models for these tasks, finding that current approaches create siloed solutions that perform poorly on out-of-distribution texts. To address this limitation, we compile a diverse dataset by combining 12 datasets for political leaning classification and creating a new dataset for politicalness by extending 18 existing datasets with the appropriate label. Through extensive benchmarking with leave-one-in and leave-one-out methodologies, we evaluate the performance of existing models and train new ones with enhanced generalization capabilities.