Semantic Clustering of Civic Proposals: A Case Study on Brazil's National Participation Platform

📅 2025-09-25

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

On Brazil’s national participatory platform (Brasil Participativo), the high volume of citizen proposals poses challenges for manual categorization, entails heavy reliance on domain experts, and suffers from misalignment with official policy taxonomies. Method: This paper proposes a low-intervention semantic clustering approach: (1) seed-word-guided BERTopic for domain-aware topic modeling to enhance semantic consistency; and (2) large language model (LLM)-based automated validation of topic validity and institutional alignment. Contribution/Results: The method drastically reduces human annotation effort while mitigating topic drift and policy-policy misalignment inherent in conventional clustering. Empirical evaluation shows the generated topics achieve high semantic coherence (Coherence > 0.52) and strong mapping accuracy to official classifications (F1 = 0.83), with a 4.7× improvement in processing efficiency. This work provides a reproducible technical pathway for large-scale structuring of public input in digital democracy contexts.

Technology Category

Application Category

📝 Abstract

Promoting participation on digital platforms such as Brasil Participativo has emerged as a top priority for governments worldwide. However, due to the sheer volume of contributions, much of this engagement goes underutilized, as organizing it presents significant challenges: (1) manual classification is unfeasible at scale; (2) expert involvement is required; and (3) alignment with official taxonomies is necessary. In this paper, we introduce an approach that combines BERTopic with seed words and automatic validation by large language models. Initial results indicate that the generated topics are coherent and institutionally aligned, with minimal human effort. This methodology enables governments to transform large volumes of citizen input into actionable data for public policy.

Problem

Research questions and friction points this paper is trying to address.

Organizing large volumes of citizen proposals on digital participation platforms

Overcoming manual classification challenges at scale requiring expert involvement

Ensuring proposal categorization aligns with official government taxonomies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining BERTopic with seed words

Using large language models for validation

Generating institutionally aligned topics automatically

🔎 Similar Papers

No similar papers found.