PolitiSky24: U.S. Political Bluesky Dataset with User Stance Labels

📅 2025-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of user-level political stance annotations on Bluesky during the 2024 U.S. presidential election, this work introduces the first Bluesky user-level stance dataset targeting Kamala Harris and Donald Trump, comprising 16,044 user-target stance pairs. The dataset integrates interaction graphs, full posting histories, and engagement metadata. We propose a novel user-level stance annotation paradigm that combines LLM-driven transparent labeling—explicitly providing reasoning justifications and textual span evidence—with advanced information retrieval and stance inference techniques. Our annotation achieves 81% accuracy, validated against expert judgments. The dataset is publicly released on Zenodo. This work fills a critical gap in user-level stance modeling on emerging decentralized social platforms, enabling cross-platform comparative analysis and large-scale political communication research.

Technology Category

Application Category

📝 Abstract
Stance detection identifies the viewpoint expressed in text toward a specific target, such as a political figure. While previous datasets have focused primarily on tweet-level stances from established platforms, user-level stance resources, especially on emerging platforms like Bluesky remain scarce. User-level stance detection provides a more holistic view by considering a user's complete posting history rather than isolated posts. We present the first stance detection dataset for the 2024 U.S. presidential election, collected from Bluesky and centered on Kamala Harris and Donald Trump. The dataset comprises 16,044 user-target stance pairs enriched with engagement metadata, interaction graphs, and user posting histories. PolitiSky24 was created using a carefully evaluated pipeline combining advanced information retrieval and large language models, which generates stance labels with supporting rationales and text spans for transparency. The labeling approach achieves 81% accuracy with scalable LLMs. This resource addresses gaps in political stance analysis through its timeliness, open-data nature, and user-level perspective. The dataset is available at https://doi.org/10.5281/zenodo.15616911
Problem

Research questions and friction points this paper is trying to address.

Lack of user-level stance datasets for emerging platforms like Bluesky
Need for comprehensive political stance analysis in 2024 U.S. election
Absence of transparent, scalable stance labeling methods with rationales
Innovation

Methods, ideas, or system contributions that make the work stand out.

First Bluesky dataset for US election stances
Combines info retrieval with LLM labeling
Provides user-level stance with engagement metadata
🔎 Similar Papers
No similar papers found.