A Blue Start: A large-scale pairwise and higher-order social network dataset

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing network research predominantly focuses on pairwise interactions, overlooking critical higher-order group interactions prevalent in applications such as disease transmission and information diffusion. Moreover, empirical data—especially large-scale benchmark datasets jointly capturing both pairwise relations and higher-order group structures—remain scarce. Method: Leveraging the open Bluesky API, we systematically collect and structurally parse user-generated “starter packs”—self-organized higher-order groups—and their associated follower-followee relationships. Contribution/Results: We introduce the first large-scale social network dataset integrating both pairwise and higher-order structural information: it comprises 26.7 million nodes, 1.6 billion directed follow edges, and 301,000 higher-order groups. This dataset fills a critical gap in empirical higher-order network research and provides a foundational benchmark for modeling group-level dynamics.

Technology Category

Application Category

📝 Abstract

Large-scale networks have been instrumental in shaping the way that we think about how individuals interact with one another, developing key insights in mathematical epidemiology, computational social science, and biology. However, many of the underlying social systems through which diseases spread, information disseminates, and individuals interact are inherently mediated through groups of arbitrary size, known as higher-order interactions. There is a gap between higher-order dynamics of group formation and fragmentation, contagion spread, and social influence and the data necessary to validate these higher-order mechanisms. Similarly, few datasets bridge the gap between these pairwise and higher-order network data. Because of its open API, the Bluesky social media platform provides a laboratory for observing social ties at scale. In addition to pairwise following relationships, unlike many other social networks, Bluesky features user-curated lists known as"starter packs"as a mechanism for social network growth. We introduce"A Blue Start", a large-scale network dataset comprising 26.7M users and their 1.6B pairwise following relationships and 301.3K groups representing starter packs. This dataset will be an essential resource for the study of higher-order network science.

Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale datasets for higher-order social interactions

Gap between theoretical higher-order dynamics and empirical validation

Need for datasets bridging pairwise and group-based network structures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes Bluesky's open API for data collection

Includes pairwise and higher-order interaction data

Features user-curated starter packs for network analysis

🔎 Similar Papers

No similar papers found.