Scaling Crowdsourced Election Monitoring: Construction and Evaluation of Classification Models for Multilingual and Cross-Domain Classification Settings

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the automatic classification of multilingual, cross-election-domain crowdsourced election reports. We propose a novel two-stage classification framework: first identifying informative reports, then performing fine-grained classification of information types. Methodologically, we integrate XLM-RoBERTa and multilingual Sentence-BERT embeddings with linguistically motivated heuristic features, enabling zero-shot and few-shot cross-domain transfer. We systematically identify and quantify performance disparities between English and Swahili for the first time. Experiments show an F1 score of 77% for informativeness detection and 75% for information-type classification; cross-domain zero-shot and few-shot F1 scores reach 59% and 63%, respectively. Our approach significantly enhances the scalability and generalizability of crowdsourced election monitoring, providing a reusable technical pathway for low-resource, multilingual election observation systems.

Technology Category

Application Category

📝 Abstract
The adoption of crowdsourced election monitoring as a complementary alternative to traditional election monitoring is on the rise. Yet, its reliance on digital response volunteers to manually process incoming election reports poses a significant scaling bottleneck. In this paper, we address the challenge of scaling crowdsourced election monitoring by advancing the task of automated classification of crowdsourced election reports to multilingual and cross-domain classification settings. We propose a two-step classification approach of first identifying informative reports and then categorising them into distinct information types. We conduct classification experiments using multilingual transformer models such as XLM-RoBERTa and multilingual embeddings such as SBERT, augmented with linguistically motivated features. Our approach achieves F1-Scores of 77% for informativeness detection and 75% for information type classification. We conduct cross-domain experiments, applying models trained in a source electoral domain to a new target electoral domain in zero-shot and few-shot classification settings. Our results show promising potential for model transfer across electoral domains, with F1-Scores of 59% in zero-shot and 63% in few-shot settings. However, our analysis also reveals a performance bias in detecting informative English reports over Swahili, likely due to imbalances in the training data, indicating a need for caution when deploying classification models in real-world election scenarios.
Problem

Research questions and friction points this paper is trying to address.

Automated classification of multilingual election reports.
Cross-domain model transfer for election monitoring.
Performance bias in detecting informative reports across languages.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-step classification approach for election reports
Multilingual transformer models like XLM-RoBERTa and SBERT
Cross-domain model transfer in zero-shot and few-shot settings
🔎 Similar Papers
No similar papers found.