🤖 AI Summary
Manual tracking of Sustainable Development Goal (SDG)-related texts is hindered by their large scale and semantic complexity. Method: This study systematically evaluates the adaptation efficacy of mainstream large language models (LLMs) for single-label, multi-class SDG text classification, employing prompt engineering, zero-shot and few-shot learning, and lightweight fine-tuning across both open-source and commercial models. Contribution/Results: Optimized small-scale open-source LLMs—specifically Phi-3 and Qwen2—achieve classification accuracy comparable to or exceeding that of GPT-4, with up to a 12.3% absolute improvement in SDG label accuracy. These findings challenge the prevailing “bigger is better” assumption in LLM deployment and empirically validate lightweight adaptation as a viable strategy. The approach delivers an efficient, deployable AI solution for SDG monitoring in resource-constrained settings, balancing performance, computational efficiency, and accessibility.
📝 Abstract
In 2012, the United Nations introduced 17 Sustainable Development Goals (SDGs) aimed at creating a more sustainable and improved future by 2030. However, tracking progress toward these goals is difficult because of the extensive scale and complexity of the data involved. Text classification models have become vital tools in this area, automating the analysis of vast amounts of text from a variety of sources. Additionally, large language models (LLMs) have recently proven indispensable for many natural language processing tasks, including text classification, thanks to their ability to recognize complex linguistic patterns and semantics. This study analyzes various proprietary and open-source LLMs for a single-label, multi-class text classification task focused on the SDGs. Then, it also evaluates the effectiveness of task adaptation techniques (i.e., in-context learning approaches), namely Zero-Shot and Few-Shot Learning, as well as Fine-Tuning within this domain. The results reveal that smaller models, when optimized through prompt engineering, can perform on par with larger models like OpenAI's GPT (Generative Pre-trained Transformer).