CitiLink-Summ: Summarization of Discussion Subjects in European Portuguese Municipal Meeting Minutes

📅 2026-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of publicly available automatic summarization systems for European Portuguese municipal council minutes—a domain characterized by lengthy, complex administrative texts and scarce high-quality annotated data due to its low-resource setting. To bridge this gap, the authors introduce CitiLink-Summ, the first summarization dataset for European Portuguese council minutes, comprising 100 meeting records paired with 2,322 human-written thematic summaries. They establish strong baselines using state-of-the-art generative models, including BART and PRIMERA, as well as large language models (LLMs). Comprehensive evaluation via ROUGE, BLEU, METEOR, and BERTScore demonstrates the dataset’s utility, offering both a foundational resource and benchmark for advancing natural language processing in complex governmental discourse under low-resource conditions.

Technology Category

Application Category

📝 Abstract
Municipal meeting minutes are formal records documenting the discussions and decisions of local government, yet their content is often lengthy, dense, and difficult for citizens to navigate. Automatic summarization can help address this challenge by producing concise summaries for each discussion subject. Despite its potential, research on summarizing discussion subjects in municipal meeting minutes remains largely unexplored, especially in low-resource languages, where the inherent complexity of these documents adds further challenges. A major bottleneck is the scarcity of datasets containing high-quality, manually crafted summaries, which limits the development and evaluation of effective summarization models for this domain. In this paper, we present CitiLink-Summ, a new corpus of European Portuguese municipal meeting minutes, comprising 100 documents and 2,322 manually hand-written summaries, each corresponding to a distinct discussion subject. Leveraging this dataset, we establish baseline results for automatic summarization in this domain, employing state-of-the-art generative models (e.g., BART, PRIMERA) as well as large language models (LLMs), evaluated with both lexical and semantic metrics such as ROUGE, BLEU, METEOR, and BERTScore. CitiLink-Summ provides the first benchmark for municipal-domain summarization in European Portuguese, offering a valuable resource for advancing NLP research on complex administrative texts.
Problem

Research questions and friction points this paper is trying to address.

municipal meeting minutes
automatic summarization
low-resource languages
discussion subjects
dataset scarcity
Innovation

Methods, ideas, or system contributions that make the work stand out.

municipal meeting summarization
low-resource language
manual summary corpus
European Portuguese
automatic summarization benchmark
🔎 Similar Papers
No similar papers found.
M
Miguel Marques
University of Beira Interior, Covilhã, Portugal; INESC TEC, Porto, Portugal
A
Ana Luísa Fernandes
Universidade do Porto, Porto, Portugal; INESC TEC, Porto, Portugal
A
Ana Filipa Pacheco
Universidade do Porto, Porto, Portugal; INESC TEC, Porto, Portugal
R
Rute Rebouças
Universidade do Porto, Porto, Portugal; INESC TEC, Porto, Portugal
I
Inês Cantante
Universidade do Porto, Porto, Portugal; INESC TEC, Porto, Portugal
J
José Isidro
Universidade do Porto, Porto, Portugal; INESC TEC, Porto, Portugal
L
Luís Filipe Cunha
Universidade do Porto, Porto, Portugal; INESC TEC, Porto, Portugal
Alípio Jorge
Alípio Jorge
University of Porto, FCUP, DCC, INESC TEC, LIAAD
Machine LearningNLPNarrative ExtractionRecommender SystemsArtificial Intelligence
N
Nuno Guimarães
Universidade do Porto, Porto, Portugal; INESC TEC, Porto, Portugal
Sérgio Nunes
Sérgio Nunes
INESC TEC and Faculty of Engineering, University of Porto, Portugal
Information RetrievalInformation ManagementInformation SystemsWeb Technologies
António Leal
António Leal
University of Macau; University of Porto (on leave); CLUP
Semântica
Purificação Silvano
Purificação Silvano
Faculdade de Letras da Universidade do Porto
LinguisticsSemanticsCorpora AnnotationDiscourse
Ricardo Campos
Ricardo Campos
Universidade da Beira Interior
Natural Language ProcessingData ScienceInformation Retrieval