Semantic Clustering of Civic Proposals: A Case Study on Brazil's National Participation Platform

📅 2025-09-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
On Brazil’s national participatory platform (Brasil Participativo), the high volume of citizen proposals poses challenges for manual categorization, entails heavy reliance on domain experts, and suffers from misalignment with official policy taxonomies. Method: This paper proposes a low-intervention semantic clustering approach: (1) seed-word-guided BERTopic for domain-aware topic modeling to enhance semantic consistency; and (2) large language model (LLM)-based automated validation of topic validity and institutional alignment. Contribution/Results: The method drastically reduces human annotation effort while mitigating topic drift and policy-policy misalignment inherent in conventional clustering. Empirical evaluation shows the generated topics achieve high semantic coherence (Coherence > 0.52) and strong mapping accuracy to official classifications (F1 = 0.83), with a 4.7× improvement in processing efficiency. This work provides a reproducible technical pathway for large-scale structuring of public input in digital democracy contexts.

Technology Category

Application Category

📝 Abstract
Promoting participation on digital platforms such as Brasil Participativo has emerged as a top priority for governments worldwide. However, due to the sheer volume of contributions, much of this engagement goes underutilized, as organizing it presents significant challenges: (1) manual classification is unfeasible at scale; (2) expert involvement is required; and (3) alignment with official taxonomies is necessary. In this paper, we introduce an approach that combines BERTopic with seed words and automatic validation by large language models. Initial results indicate that the generated topics are coherent and institutionally aligned, with minimal human effort. This methodology enables governments to transform large volumes of citizen input into actionable data for public policy.
Problem

Research questions and friction points this paper is trying to address.

Organizing large volumes of citizen proposals on digital participation platforms
Overcoming manual classification challenges at scale requiring expert involvement
Ensuring proposal categorization aligns with official government taxonomies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining BERTopic with seed words
Using large language models for validation
Generating institutionally aligned topics automatically
🔎 Similar Papers
No similar papers found.
R
Ronivaldo Ferreira
Faculdade da Computação, Universidade Federal do Pará (UFPA), Belém – PA – Brazil
G
Guilherme da Silva
Faculdade do Gama, Universidade de Brasília (UnB), Brasília – DF – Brazil
C
Carla Rocha
Faculdade do Gama, Universidade de Brasília (UnB), Brasília – DF – Brazil
Gustavo Pinto
Gustavo Pinto
UFPA & Zup Innovation
Software EngineeringRefactoringSoftware RepositoriesML4SE