New Evaluation Paradigm for Lexical Simplification

📅 2025-01-25

📈 Citations: 0

✨ Influential: 0

career value

144K/year

🤖 AI Summary

Existing lexical simplification (LS) evaluation methods focus solely on individual difficult word substitution, failing to assess sentence-level simplification quality—particularly in contextual modeling and stepwise simplification. This work proposes the first end-to-end, sentence-level LS evaluation paradigm tailored for large language models (LLMs). We design a human-in-the-loop, full-coverage annotation protocol and develop a multi-LLM collaborative framework that explicitly simulates the three-stage LS process—identification, substitution, and reconstruction—thereby overcoming the limitations of single-prompt simplification. Evaluated on a newly constructed benchmark, our method significantly outperforms all baseline approaches. To the best of our knowledge, this is the first systematic, reproducible assessment of LLMs’ holistic sentence simplification capability. The results empirically validate the effectiveness and advancement of the proposed end-to-end evaluation paradigm.

Technology Category

Application Category

📝 Abstract

Lexical Simplification (LS) methods use a three-step pipeline: complex word identification, substitute generation, and substitute ranking, each with separate evaluation datasets. We found large language models (LLMs) can simplify sentences directly with a single prompt, bypassing the traditional pipeline. However, existing LS datasets are not suitable for evaluating these LLM-generated simplified sentences, as they focus on providing substitutes for single complex words without identifying all complex words in a sentence. To address this gap, we propose a new annotation method for constructing an all-in-one LS dataset through human-machine collaboration. Automated methods generate a pool of potential substitutes, which human annotators then assess, suggesting additional alternatives as needed. Additionally, we explore LLM-based methods with single prompts, in-context learning, and chain-of-thought techniques. We introduce a multi-LLMs collaboration approach to simulate each step of the LS task. Experimental results demonstrate that LS based on multi-LLMs approaches significantly outperforms existing baselines.

Problem

Research questions and friction points this paper is trying to address.

Vocabulary Simplification

Sentence-level Assessment

Contextual Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Collaborative Human-Machine Approach

Sentence-Level Lexical Simplification

Large Language Model Integration

🔎 Similar Papers

An In-depth Evaluation of Large Language Models in Sentence Simplification with Error-based Human Assessment