🤖 AI Summary
This work addresses the lack of systematic evaluation of large language models (LLMs) on multi-granularity text simplification—spanning lexical, syntactic, sentence-, and document-level tasks. We introduce the first unified benchmark covering all four granularity levels. Using lightweight and mainstream open- and closed-source LLMs, we comparatively evaluate them against traditional non-LLM simplification methods, employing both automatic metrics (BLEU, SARI) and multidimensional human assessment (readability, conciseness, meaning preservation). Results show that LLMs significantly outperform conventional approaches across all four granularities; notably, outputs from several models surpass human reference texts in quality, challenging the authority of existing “gold-standard” references. This study fills a critical gap in cross-granularity LLM-based simplification evaluation and establishes a comprehensive assessment framework that integrates automated metrics with human judgment—providing a new benchmark and methodological foundation for text simplification research.
📝 Abstract
Text simplification (TS) refers to the process of reducing the complexity of a text while retaining its original meaning and key information. Existing work only shows that large language models (LLMs) have outperformed supervised non-LLM-based methods on sentence simplification. This study offers the first comprehensive analysis of LLM performance across four TS tasks: lexical, syntactic, sentence, and document simplification. We compare lightweight, closed-source and open-source LLMs against traditional non-LLM methods using automatic metrics and human evaluations. Our experiments reveal that LLMs not only outperform non-LLM approaches in all four tasks but also often generate outputs that exceed the quality of existing human-annotated references. Finally, we present some future directions of TS in the era of LLMs.