🤖 AI Summary
On Brazil’s national participatory platform (Brasil Participativo), the high volume of citizen proposals poses challenges for manual categorization, entails heavy reliance on domain experts, and suffers from misalignment with official policy taxonomies.
Method: This paper proposes a low-intervention semantic clustering approach: (1) seed-word-guided BERTopic for domain-aware topic modeling to enhance semantic consistency; and (2) large language model (LLM)-based automated validation of topic validity and institutional alignment.
Contribution/Results: The method drastically reduces human annotation effort while mitigating topic drift and policy-policy misalignment inherent in conventional clustering. Empirical evaluation shows the generated topics achieve high semantic coherence (Coherence > 0.52) and strong mapping accuracy to official classifications (F1 = 0.83), with a 4.7× improvement in processing efficiency. This work provides a reproducible technical pathway for large-scale structuring of public input in digital democracy contexts.
📝 Abstract
Promoting participation on digital platforms such as Brasil Participativo has emerged as a top priority for governments worldwide. However, due to the sheer volume of contributions, much of this engagement goes underutilized, as organizing it presents significant challenges: (1) manual classification is unfeasible at scale; (2) expert involvement is required; and (3) alignment with official taxonomies is necessary. In this paper, we introduce an approach that combines BERTopic with seed words and automatic validation by large language models. Initial results indicate that the generated topics are coherent and institutionally aligned, with minimal human effort. This methodology enables governments to transform large volumes of citizen input into actionable data for public policy.