NLG Evaluation: Past, Present, Future

๐Ÿ“… 2026-05-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

165K/year
๐Ÿค– AI Summary
This study addresses the lack of systematic evolutionary analysis in current natural language generation (NLG) evaluation, which hinders its ability to meet future assessment demands along dimensions of impact, qualitative understanding, and safety. The work presents the first comprehensive historical synthesis of NLG evaluation since 1990, integrating retrospective review with forward-looking trend analysis across human evaluation, automatic metrics, and emerging approaches such as LLM-as-Judge. It reveals a paradigmatic shift from linguistically oriented, non-experimental methodologies toward machine learningโ€“driven, experimentally grounded frameworks. Furthermore, the paper prospectively identifies impact, qualitative insight, and safety as pivotal dimensions for next-generation NLG evaluation, offering a theoretical foundation and strategic direction for developing more robust and holistic assessment systems.
๐Ÿ“ Abstract
Natural Language Generation (NLG) evaluation has changed dramatically since 1990, and will continue to evolve in the future. In 1990, when NLG had close ties to linguistics, there was very little formal experimental evaluation in the modern sense. In 2026, when NLG is closely linked to machine learning, experimental evaluation is expected and indeed fundamental to research. Many evaluation techniques were developed over this period, including most recently LLM-as-Judge. I expect NLG evaluation will continue to evolve in the future. In particular, impact, qualitative, and safety evaluation will become more important as large numbers of people routinely use NLG technology.
Problem

Research questions and friction points this paper is trying to address.

NLG evaluation
impact evaluation
qualitative evaluation
safety evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

NLG evaluation
LLM-as-Judge
qualitative evaluation
safety evaluation
impact assessment