Readability Reconsidered: A Cross-Dataset Analysis of Reference-Free Metrics

πŸ“… 2025-10-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current readability assessment suffers from inconsistent definitions and overreliance on superficial linguistic features, leading to a substantial misalignment between automated metrics and human perception. To address this, we conduct a systematic analysis based on 897 human judgments, investigating how information content and topical coherence influence comprehension difficulty. Within a unified evaluation framework, we comparatively assess 15 traditional and 6 model-based reference-free readability metrics across multiple datasets via cross-dataset validation and correlation analysis (Spearman’s ρ). Results show that four deep semantic modeling approaches consistently rank in the top four for ranking consistency, whereas the best-performing traditional metric achieves only an average rank of 8.6. These findings empirically validate that semantic awareness is critical for enhancing the cognitive fidelity of readability modeling. The study provides both empirical grounding and a novel methodological pathway toward human-aligned automatic readability assessment.

Technology Category

Application Category

πŸ“ Abstract
Automatic readability assessment plays a key role in ensuring effective and accessible written communication. Despite significant progress, the field is hindered by inconsistent definitions of readability and measurements that rely on surface-level text properties. In this work, we investigate the factors shaping human perceptions of readability through the analysis of 897 judgments, finding that, beyond surface-level cues, information content and topic strongly shape text comprehensibility. Furthermore, we evaluate 15 popular readability metrics across five English datasets, contrasting them with six more nuanced, model-based metrics. Our results show that four model-based metrics consistently place among the top four in rank correlations with human judgments, while the best performing traditional metric achieves an average rank of 8.6. These findings highlight a mismatch between current readability metrics and human perceptions, pointing to model-based approaches as a more promising direction.
Problem

Research questions and friction points this paper is trying to address.

Investigating factors shaping human perceptions of text readability
Evaluating traditional versus model-based readability metrics across datasets
Addressing mismatch between current metrics and human readability judgments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model-based metrics outperform traditional readability assessments
Analyzed 897 human judgments to identify comprehensibility factors
Evaluated 15 metrics across five datasets using rank correlations
πŸ”Ž Similar Papers
No similar papers found.