Analyzing Political Text at Scale with Online Tensor LDA

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Addressing the challenge of real-time topic modeling on massive political text corpora (e.g., billion-scale document collections) in social science research, this paper introduces Tensor LDA—a tensor-based topic model with provable parameter identifiability and theoretical sample complexity guarantees. Leveraging online tensor decomposition and GPU acceleration, Tensor LDA achieves computational and memory efficiency, enabling, for the first time, linear scalability to billion-document datasets. We release a distributed, GPU-optimized open-source implementation. Empirical validation is conducted on two high-impact case studies: the evolution of the #MeToo movement and the discourse surrounding election fraud claims in the 2020 U.S. presidential election. Our approach enables large-scale, near-real-time, theory-driven political communication analysis—previously infeasible—and substantially expands the methodological frontier of computational social science.

Technology Category

Application Category

📝 Abstract

This paper proposes a topic modeling method that scales linearly to billions of documents. We make three core contributions: i) we present a topic modeling method, Tensor Latent Dirichlet Allocation (TLDA), that has identifiable and recoverable parameter guarantees and sample complexity guarantees for large data; ii) we show that this method is computationally and memory efficient (achieving speeds over 3-4x those of prior parallelized Latent Dirichlet Allocation (LDA) methods), and that it scales linearly to text datasets with over a billion documents; iii) we provide an open-source, GPU-based implementation, of this method. This scaling enables previously prohibitive analyses, and we perform two real-world, large-scale new studies of interest to political scientists: we provide the first thorough analysis of the evolution of the #MeToo movement through the lens of over two years of Twitter conversation and a detailed study of social media conversations about election fraud in the 2020 presidential election. Thus this method provides social scientists with the ability to study very large corpora at scale and to answer important theoretically-relevant questions about salient issues in near real-time.

Problem

Research questions and friction points this paper is trying to address.

Develops scalable topic modeling for billion-document political text analysis

Enables efficient large-scale studies of social movements like #MeToo

Provides real-time analysis of political discourse on election fraud

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tensor LDA ensures identifiable and recoverable parameters

Method scales linearly to billions of documents efficiently

Provides open-source GPU-based implementation for large datasets

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

Research Engineer, Monetization AI