LLM for Comparative Narrative Analysis

📅 2025-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study systematically evaluates the narrative understanding and analysis capabilities of GPT-3.5, PaLM2, and Llama2—representing leading closed- and open-weight LLMs—under controlled conditions. Method: We employ standardized prompt engineering to isolate model-specific behaviors, conduct cross-model response comparison, and introduce a novel four-dimensional human evaluation framework—assessing consistency, logical coherence, richness, and stance neutrality—with expert annotations serving as the gold standard for fair, reproducible, multi-dimensional quantification. Contribution/Results: Results reveal significant inter-model response disparities under identical prompts, reflecting fundamental differences in underlying reasoning mechanisms and semantic modeling capacities. The work not only exposes a pronounced performance gap between state-of-the-art open and closed LLMs in narrative intelligence but also establishes the first standardized, task-specific evaluation paradigm for narrative capability. This provides a transferable methodological benchmark for fine-grained, capability-oriented LLM assessment.

Technology Category

Application Category

📝 Abstract
In this paper, we conducted a Multi-Perspective Comparative Narrative Analysis (CNA) on three prominent LLMs: GPT-3.5, PaLM2, and Llama2. We applied identical prompts and evaluated their outputs on specific tasks, ensuring an equitable and unbiased comparison between various LLMs. Our study revealed that the three LLMs generated divergent responses to the same prompt, indicating notable discrepancies in their ability to comprehend and analyze the given task. Human evaluation was used as the gold standard, evaluating four perspectives to analyze differences in LLM performance.
Problem

Research questions and friction points this paper is trying to address.

Compare performance of GPT-3.5, PaLM2, and Llama2 on narrative tasks
Evaluate LLM responses using identical prompts for unbiased analysis
Assess comprehension discrepancies in LLMs via human evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Perspective Comparative Narrative Analysis
Identical prompts for unbiased LLM comparison
Human evaluation as gold standard
🔎 Similar Papers
No similar papers found.
L
Leo Kampen
MCS department, Gustavus Adolphus College
C
Carlos Rabat Villarreal
Department of CSSE, Auburn University
L
Louis Yu
MCS department, Gustavus Adolphus College
S
Santu Karmaker
Bridge-AI Lab, Department of CS, University of Central Florida
Dongji Feng
Dongji Feng
California State University, Monterey Bay
Information RetrievalNLUevaluation