Benchmarking Linguistic Diversity of Large Language Models

📅 2024-12-13

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses the long-overlooked core problem of insufficient linguistic diversity in large language models (LLMs), whose outputs systematically deviate from the richness of human language. We propose the first comprehensive evaluation framework spanning lexical, syntactic, and semantic dimensions. Our methodology introduces a multi-granularity diversity assessment paradigm, featuring the first in-depth syntactic diversity case study and quantifying the nonlinear regulatory effects of training objectives, decoding strategies (e.g., temperature, top-p), and instruction fine-tuning on diversity. Evaluation integrates multiple metrics: information entropy, n-gram distributional diversity, dependency tree complexity, and semantic dispersion in embedding space. Experimental results reveal pervasive lexical fixation, syntactic simplification, and semantic convergence across mainstream LLMs. These findings provide both theoretical grounding and actionable pathways for advancing human-level linguistic diversity in generative language modeling.

Technology Category

Application Category

📝 Abstract

The development and evaluation of Large Language Models (LLMs) has primarily focused on their task-solving capabilities, with recent models even surpassing human performance in some areas. However, this focus often neglects whether machine-generated language matches the human level of diversity, in terms of vocabulary choice, syntactic construction, and expression of meaning, raising questions about whether the fundamentals of language generation have been fully addressed. This paper emphasizes the importance of examining the preservation of human linguistic richness by language models, given the concerning surge in online content produced or aided by LLMs. We propose a comprehensive framework for evaluating LLMs from various linguistic diversity perspectives including lexical, syntactic, and semantic dimensions. Using this framework, we benchmark several state-of-the-art LLMs across all diversity dimensions, and conduct an in-depth case study for syntactic diversity. Finally, we analyze how different development and deployment choices impact the linguistic diversity of LLM outputs.

Problem

Research questions and friction points this paper is trying to address.

Assessing linguistic diversity in Large Language Models outputs

Evaluating lexical, syntactic, and semantic diversity dimensions

Analyzing development choices affecting LLM language variety

Innovation

Methods, ideas, or system contributions that make the work stand out.

Framework evaluates lexical, syntactic, semantic diversity

Benchmarking state-of-the-art LLMs across diversity dimensions

Analyzing development choices impact on linguistic diversity

🔎 Similar Papers

A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers