Charting a Decade of Computational Linguistics in Italy: The CLiC-it Corpus

📅 2025-09-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically characterizes the evolution of computational linguistics (CL) and natural language processing (NLP) research in Italy from 2014 to 2023. To address the lack of longitudinal, metadata-rich resources for regional NLP analysis, we construct CLiC-it Corpus—the first open-source, decade-spanning corpus derived from all ten editions of the CLiC-it conference, enriched with granular metadata (e.g., author affiliations, gender, geographic location). We apply integrated methodologies including text mining, topic modeling (LDA, BERTopic), statistical analysis, and Transformer-based semantic representations (e.g., sentence embeddings). Key findings include a marked shift from early resource-centric work toward language modeling and multimodal NLP; increasing concentration of scholarly output among a few institutions; persistent geographic and gender imbalances; and clear thematic alignment with the global large language model (LLM) paradigm shift. The corpus and analytical framework provide a reproducible, temporal cartography of Italy’s NLP ecosystem, enabling evidence-based assessment of regional academic development.

Technology Category

Application Category

📝 Abstract
Over the past decade, Computational Linguistics (CL) and Natural Language Processing (NLP) have evolved rapidly, especially with the advent of Transformer-based Large Language Models (LLMs). This shift has transformed research goals and priorities, from Lexical and Semantic Resources to Language Modelling and Multimodality. In this study, we track the research trends of the Italian CL and NLP community through an analysis of the contributions to CLiC-it, arguably the leading Italian conference in the field. We compile the proceedings from the first 10 editions of the CLiC-it conference (from 2014 to 2024) into the CLiC-it Corpus, providing a comprehensive analysis of both its metadata, including author provenance, gender, affiliations, and more, as well as the content of the papers themselves, which address various topics. Our goal is to provide the Italian and international research communities with valuable insights into emerging trends and key developments over time, supporting informed decisions and future directions in the field.
Problem

Research questions and friction points this paper is trying to address.

Tracking Italian CL/NLP research trends over a decade
Analyzing metadata and content of CLiC-it conference papers
Identifying emerging trends and key developments in Italian computational linguistics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compiling conference proceedings into corpus
Analyzing metadata and content comprehensively
Tracking research trends over decade
🔎 Similar Papers
No similar papers found.
C
Chiara Alzetta
Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, Pisa−ItaliaNLP Lab
S
Serena Auriemma
CoLingLab, Department of Philology, Literature and Linguistics, University of Pisa
A
Alessandro Bondielli
Department of Computer Science, University of Pisa CoLingLab, Department of Philology, Literature and Linguistics, University of Pisa
L
Luca Dini
University of Pisa, Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, Pisa−ItaliaNLP Lab
C
Chiara Fazzone
Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, Pisa−ItaliaNLP Lab
Alessio Miaschi
Alessio Miaschi
Researcher in Computational Linguistics, ItaliaNLP Lab @ CNR-ILC, Pisa
Natural Language ProcessingComputational LinguisticsLanguage ModelsDeep Learning
M
Martina Miliani
CoLingLab, Department of Philology, Literature and Linguistics, University of Pisa
M
Marta Sartor
Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, Pisa−ItaliaNLP Lab