AraFinNews: Arabic Financial Summarisation with Domain-Adapted LLMs

📅 2025-11-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the impact of domain-specific pretraining on abstractive summarization quality for Arabic financial news. Addressing deficiencies in existing datasets and models—particularly regarding factual accuracy, numerical reliability, and stylistic consistency with financial discourse—the authors introduce AraFinNews, the first large-scale, publicly available Arabic financial news summarization dataset (212,500 article–headline pairs, spanning 2014–2023). They further propose FinAraT5, a domain-adapted pretraining model built upon the AraT5 architecture, incorporating specialized pretraining objectives focused on financial terminology, numeric expressions, and key domain entities. This enhances factual consistency, quantitative information handling, and narrative professionalism in generated summaries. Experiments demonstrate that FinAraT5 outperforms both multilingual and monolingual general-purpose baselines across multiple automatic and human evaluation metrics. The study establishes a reproducible benchmark and an effective paradigm for low-resource domain-specific NLP tasks in finance.

Technology Category

Application Category

📝 Abstract
This paper investigates the impact of domain specificity on abstractive summarisation of Arabic financial texts using large language models (LLMs). We introduce AraFinNews, the largest publicly available Arabic financial news dataset to date, comprising 212,500 article--headline pairs spanning nearly a decade of reporting from October 2015 to July 2025. Designed as the Arabic equivalent of major English summarisation corpora such as CNN/DailyMail, AraFinNews provides a robust benchmark for evaluating domain-specific language understanding and generation in financial contexts. Using this resource, we evaluate transformer-based models -- including mT5, AraT5, and the domain-adapted FinAraT5 -- to examine how financial-domain pretraining influences factual accuracy, numerical reliability, and stylistic alignment with professional reporting. Experimental results show that domain-adapted models generate more faithful and coherent summaries, particularly in handling quantitative and entity-centric information. The findings highlight the importance of domain-specific adaptation for improving factual consistency and narrative fluency in Arabic financial summarisation. The dataset is freely available for non-commercial research at https://github.com/ArabicNLP-UK/AraFinNews.
Problem

Research questions and friction points this paper is trying to address.

Domain-specific adaptation for Arabic financial summarisation
Evaluating factual accuracy and numerical reliability in summaries
Improving coherence and fluency in financial text generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-adapted LLMs for Arabic financial summarization
Largest Arabic financial news dataset AraFinNews created
Financial pretraining improves factual accuracy and coherence
🔎 Similar Papers
No similar papers found.