🤖 AI Summary
This study investigates how event sampling strategies influence comparative conclusions about the timeliness of breaking news dissemination between social media and press releases. Applying a unified analytical framework, the authors examine two distinct event samples—derived from Wikipedia’s trending events and Polymarket prediction market trading spikes—and integrate commercial social listening data to evaluate performance using median reporting delay and channel win rate metrics. The research reveals, for the first time, that conclusions regarding news timeliness are highly sensitive to sampling methodology: different event sets substantially alter the observed temporal relationships among platforms. Collectively, emerging social media sources account for nearly 30% of the earliest reports, while approximately 24% of events are entirely absent from at least one data source, underscoring the critical impact of sampling bias on news diffusion studies.
📝 Abstract
Osborne and Dredze (2014) reported that Twitter was the timeliest social-media source of breaking news, trailing only newswire. Twelve years on, the platform landscape has shifted - Google+ is gone, X replaced Twitter, Bluesky and Threads have appeared - and platform data now flows almost exclusively through commercial social-listening providers that redact key fields. We revisit the question with two sampling designs run through the same downstream pipeline. Sample A draws N = 50 events from the Wikipedia Current Events Portal (WCEP) ranked by article pageviews. Sample B draws N = 109 events from Polymarket prediction markets ranked by USD trading volume, with each event's news moment pinned to the largest 1-hour trade-volume spike. Both samples are pulled from one commercial provider across nine indexed channels. We report three findings. (1) The X-vs-news direction depends on the sample. News leads X by a median of 21.6 min on Sample A (n = 6 paired); the same comparison is tied at -0.02 min on Sample B (n = 16 paired, X earliest in 38%). (2) The channel ecosystem has diversified. Bluesky, Facebook public, and YouTube together account for 24-32% of earliest channel wins; the 2014 "X versus newswire" framing no longer fits. (3) Coverage gaps are structural. Even with U.S.-relevance filtering and a pageview prior, the provider's index returns no on-topic evidence on 24% of randomly-sampled WCEP events. The paper's contribution is the cross-surface design that exposes the sample dependency in (1).