LLM for Comparative Narrative Analysis

📅 2025-04-11

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This study systematically evaluates the narrative understanding and analysis capabilities of GPT-3.5, PaLM2, and Llama2—representing leading closed- and open-weight LLMs—under controlled conditions. Method: We employ standardized prompt engineering to isolate model-specific behaviors, conduct cross-model response comparison, and introduce a novel four-dimensional human evaluation framework—assessing consistency, logical coherence, richness, and stance neutrality—with expert annotations serving as the gold standard for fair, reproducible, multi-dimensional quantification. Contribution/Results: Results reveal significant inter-model response disparities under identical prompts, reflecting fundamental differences in underlying reasoning mechanisms and semantic modeling capacities. The work not only exposes a pronounced performance gap between state-of-the-art open and closed LLMs in narrative intelligence but also establishes the first standardized, task-specific evaluation paradigm for narrative capability. This provides a transferable methodological benchmark for fine-grained, capability-oriented LLM assessment.

Technology Category

Application Category

📝 Abstract

In this paper, we conducted a Multi-Perspective Comparative Narrative Analysis (CNA) on three prominent LLMs: GPT-3.5, PaLM2, and Llama2. We applied identical prompts and evaluated their outputs on specific tasks, ensuring an equitable and unbiased comparison between various LLMs. Our study revealed that the three LLMs generated divergent responses to the same prompt, indicating notable discrepancies in their ability to comprehend and analyze the given task. Human evaluation was used as the gold standard, evaluating four perspectives to analyze differences in LLM performance.

Problem

Research questions and friction points this paper is trying to address.

Compare performance of GPT-3.5, PaLM2, and Llama2 on narrative tasks

Evaluate LLM responses using identical prompts for unbiased analysis

Assess comprehension discrepancies in LLMs via human evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Perspective Comparative Narrative Analysis

Identical prompts for unbiased LLM comparison

Human evaluation as gold standard

🔎 Similar Papers

Says Who? Effective Zero-Shot Annotation of Focalization