DomainDemo: a dataset of domain-sharing activities among different demographic groups on Twitter

📅 2025-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the mechanisms linking demographic attributes—age, gender, race, political party affiliation, and geography—to the dissemination of political information on social media. Method: Leveraging a panel of 1.5 million U.S. registered voters, we construct the first large-scale, high-confidence demographic–domain matching dataset covering shared domains on Twitter (X) from 2011 to 2022, uniquely integrating granular social behavioral data with authoritative voter registration records. We propose five interpretable, domain-level sociosemantic metrics—including locality and partisan slant—to systematically characterize 129,000 websites; all five metrics exhibit statistically significant alignment (p < 0.001) with established domain classifications. Contribution/Results: The resulting dataset is fully reproducible and extensible, providing the first decade-spanning, fine-grained, empirically grounded resource for digital political communication research.

Technology Category

Application Category

📝 Abstract
Social media play a pivotal role in disseminating web content, particularly during elections, yet our understanding of the association between demographic factors and political discourse online remains limited. Here, we introduce a unique dataset, DomainDemo, linking domains shared on Twitter (X) with the demographic characteristics of associated users, including age, gender, race, political affiliation, and geolocation, from 2011 to 2022. This new resource was derived from a panel of over 1.5 million Twitter users matched against their U.S. voter registration records, facilitating a better understanding of a decade of information flows on one of the most prominent social media platforms and trends in political and public discourse among registered U.S. voters from different sociodemographic groups. By aggregating user demographic information onto the domains, we derive five metrics that provide critical insights into over 129,000 websites. In particular, the localness and partisan audience metrics quantify the domains' geographical reach and ideological orientation, respectively. These metrics show substantial agreement with existing classifications, suggesting the effectiveness and reliability of DomainDemo's approach.
Problem

Research questions and friction points this paper is trying to address.

Political Discourse
Social Media
Demographics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Social Demographic Analysis
Information Flow Trends
Website Localization and Political Bias Metrics
🔎 Similar Papers
No similar papers found.