NarraBench: A Comprehensive Framework for Narrative Benchmarking

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Existing narrative understanding evaluation benchmarks suffer from severe coverage gaps: only 27% of core narrative tasks—including event structure, style, perspective, and revelation—are effectively assessed, and subjective, perspective-dependent tasks—which inherently lack unique ground-truth answers—are largely neglected. Method: We propose a theory-driven taxonomy of narrative tasks and conduct a systematic survey of 78 benchmarks, quantitatively identifying multidimensional assessment gaps. Based on this analysis, we design NarraBench, the first comprehensive, extensible narrative evaluation framework. It introduces novel subjective modeling and perspective-sensitive evaluation mechanisms. Contribution/Results: NarraBench fills critical evaluation voids across event, style, perspective, and revelation dimensions. It establishes a new paradigm for evaluating large language models’ narrative capabilities—one grounded in narrative theory yet pragmatically implementable—thereby advancing both methodological rigor and empirical coverage in narrative AI assessment.

Technology Category

Application Category

📝 Abstract

We present NarraBench, a theory-informed taxonomy of narrative-understanding tasks, as well as an associated survey of 78 existing benchmarks in the area. We find significant need for new evaluations covering aspects of narrative understanding that are either overlooked in current work or are poorly aligned with existing metrics. Specifically, we estimate that only 27% of narrative tasks are well captured by existing benchmarks, and we note that some areas -- including narrative events, style, perspective, and revelation -- are nearly absent from current evaluations. We also note the need for increased development of benchmarks capable of assessing constitutively subjective and perspectival aspects of narrative, that is, aspects for which there is generally no single correct answer. Our taxonomy, survey, and methodology are of value to NLP researchers seeking to test LLM narrative understanding.

Problem

Research questions and friction points this paper is trying to address.

Developing a taxonomy for narrative understanding tasks

Identifying gaps in existing narrative evaluation benchmarks

Addressing subjective aspects lacking in current narrative metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed theory-informed taxonomy for narrative tasks

Identified gaps in existing narrative understanding benchmarks

Proposed methodology for subjective narrative evaluation

🔎 Similar Papers

Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks