Towards Empowering Consumers through Sentence-level Readability Scoring in German ESG Reports

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This study addresses the lack of fine-grained evaluation regarding the readability of German ESG reports for non-specialist audiences, despite their public orientation. To bridge this gap, the work introduces the first sentence-level crowdsourced readability annotations in this domain, constructs an extended dataset, and systematically evaluates a range of automated scoring approaches. Experimental results demonstrate that a fine-tuned small Transformer model achieves the best performance in predicting human readability judgments, significantly outperforming prompting strategies based on large language models. While ensemble methods yield marginal gains in accuracy, they come at the cost of reduced inference efficiency. This research establishes a new benchmark and provides effective tools for assessing the readability of German ESG texts.

Technology Category

Application Category

📝 Abstract

With the ever-growing urgency of sustainability in the economy and society, and the massive stream of information that comes with it, consumers need reliable access to that information. To address this need, companies began publishing so called Environmental, Social, and Governance (ESG) reports, both voluntarily and forced by law. To serve the public, these reports must be addressed not only to financial experts but also to non-expert audiences. But are they written clearly enough? In this work, we extend an existing sentence-level dataset of German ESG reports with crowdsourced readability annotations. We find that, in general, native speakers perceive sentences in ESG reports as easy to read, but also that readability is subjective. We apply various readability scoring methods and evaluate them regarding their prediction error and correlation with human rankings. Our analysis shows that, while LLM prompting has potential for distinguishing clear from hard-to-read sentences, a small finetuned transformer predicts human readability with the lowest error. Averaging predictions of multiple models can slightly improve the performance at the cost of slower inference.

Problem

Research questions and friction points this paper is trying to address.

readability

ESG reports

sentence-level

consumer accessibility

German

Innovation

Methods, ideas, or system contributions that make the work stand out.

sentence-level readability

German ESG reports

crowdsourced annotations