🤖 AI Summary
This study addresses the scarcity of reliable ESG sentiment evaluation data and models for small and medium-sized enterprises in emerging markets such as Slovenia by constructing the first publicly available Slovene-language ESG news sentiment dataset, developed through large language model–assisted filtering and manual annotation. The authors systematically evaluate multiple approaches—including SloBERTa, XLM-R, TabPFN, hierarchical ensembles, and open-source large language models like Gemma3-27B—across the environmental, social, and governance dimensions. Results indicate that large language models achieve the best performance in the environmental (F1-macro: 0.61) and social (F1-macro: 0.45) dimensions, while fine-tuned SloBERTa excels in the governance dimension (F1-macro: 0.54). The resulting framework effectively supports longitudinal ESG trend analysis for enterprises.
📝 Abstract
Environmental, Social, and Governance (ESG) considerations are increasingly integral to assessing corporate performance, reputation, and long-term sustainability. Yet, reliable ESG ratings remain limited for smaller companies and emerging markets. We introduce the first publicly available Slovene ESG sentiment dataset and a suite of models for automatic ESG sentiment detection. The dataset, derived from the MaCoCu Slovene news collection, combines large language model (LLM)-assisted filtering with human annotation of company-related ESG content. We evaluate the performance of monolingual (SloBERTa) and multilingual (XLM-R) models, embedding-based classifiers (TabPFN), hierarchical ensemble architectures, and large language models. Results show that LLMs achieve the strongest performance on Environmental (Gemma3-27B, F1-macro: 0.61) and Social aspects (gpt-oss 20B, F1-macro: 0.45), while fine-tuned SloBERTa is the best model on Governance classification (F1-macro: 0.54). We then show in a small case study how the best-preforming classifier (gpt-oss) can be applied to investigate ESG aspects for selected companies across a long time frame.