Open Political Corpora: Structuring, Searching, and Analyzing Political Text Collections with PoliCorp

πŸ“… 2025-09-22
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Political scientists often lack programming expertise yet require scalable time-series analysis of political texts. Method: This study develops PoliCorp, an open web platform enabling structured, interactive access to a 76-year (1949–2025) corpus of German Bundestag debatesβ€”the first such resource for this domain. Integrating NLP-based preprocessing, multi-field Boolean search, and efficient indexing, PoliCorp supports code-free advanced querying, dynamic subcorpus construction, and JSON export. Contribution/Results: Its core innovation lies in transforming long-term political discourse archives into a ready-to-use, social-science-oriented analytical infrastructure, substantially lowering barriers to qualitative text analysis. Hosted publicly (https://demo-pollux.gesis.org/), the platform provides reproducible, extensible data support for research on discursive change, policy agenda dynamics, and ideological evolution.

Technology Category

Application Category

πŸ“ Abstract
In this work, we present PoliCorp (https://demo-pollux.gesis.org/), a web portal designed to facilitate the search and analysis of political text corpora. PoliCorp provides researchers with access to rich textual data, enabling in-depth analysis of parliamentary discourse over time. The platform currently features a collection of transcripts from debates in the German parliament, spanning 76 years of proceedings. With the advanced search functionality, researchers can apply logical operations to combine or exclude search criteria, making it easier to filter through vast amounts of parliamentary debate data. The search can be customised by combining multiple fields and applying logical operators to uncover complex patterns and insights within the data. Additional data processing steps were performed to enable web-based search and incorporate extra features. A key feature that differentiates PoliCorp is its intuitive web-based interface that enables users to query processed political texts without requiring programming skills. The user-friendly platform allows for the creation of custom subcorpora via search parameters, which can be freely downloaded in JSON format for further analysis.
Problem

Research questions and friction points this paper is trying to address.

Facilitating search and analysis of political text corpora
Enabling in-depth analysis of parliamentary discourse over time
Providing intuitive access to processed political texts without programming skills
Innovation

Methods, ideas, or system contributions that make the work stand out.

Web portal for structured political text analysis
Advanced search with logical operators for filtering
User-friendly interface enabling custom subcorpora creation
πŸ”Ž Similar Papers
No similar papers found.
N
Nina Smirnova
GESIS – Leibniz Institute for the Social Sciences
M
Muhammad Ahsan Shahid
GESIS – Leibniz Institute for the Social Sciences
Philipp Mayr
Philipp Mayr
GESIS - Leibniz Institute for the Social Sciences
Interactive Information RetrievalInformetricsDigital librariesInformation SeekingDataset Search