Input Matters: Evaluating Input Structure's Impact on LLM Summaries of Sports Play-by-Play

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This study investigates how input structure affects the factual accuracy of live sports commentary summarization generated by large language models (LLMs), with emphasis on hallucination and factual error suppression in high-precision settings. Method: Leveraging structured NBA game-by-game data, we systematically compare row-based, JSON, and unstructured text input formats on Llama-3.1-70B and Qwen2.5-72B. Factuality is evaluated via human annotation and repeated-measures ANOVA with Tukey HSD post-hoc tests to quantify error rates. Contribution/Results: JSON formatting significantly improves factual consistency, reducing error rates by 69% and 65% on the two models, respectively. Input structure accounts for over 80% of the variance in factual errors. This work provides the first quantitative evidence that input structure is the dominant factor governing LLM factual accuracy—establishing a critical engineering principle for high-reliability generative applications.

Technology Category

Application Category

📝 Abstract

A major concern when deploying LLMs in accuracy-critical domains such as sports reporting is that the generated text may not faithfully reflect the input data. We quantify how input structure affects hallucinations and other factual errors in LLM-generated summaries of NBA play-by-play data, across three formats: row-structured, JSON and unstructured. We manually annotated 3,312 factual errors across 180 game summaries produced by two models, Llama-3.1-70B and Qwen2.5-72B. Input structure has a strong effect: JSON input reduces error rates by 69% for Llama and 65% for Qwen compared to unstructured input, while row-structured input reduces errors by 54% for Llama and 51% for Qwen. A two-way repeated measures ANOVA shows that input structure accounts for over 80% of the variance in error rates, with Tukey HSD post hoc tests confirming statistically significant differences between all input formats.

Problem

Research questions and friction points this paper is trying to address.

Evaluating how input structure affects LLM factual errors

Measuring hallucinations in sports play-by-play summaries

Comparing JSON, row-structured and unstructured input formats

Innovation

Methods, ideas, or system contributions that make the work stand out.

JSON input reduces factual errors by over 65%

Row-structured input cuts error rates by over 50%

Input structure accounts for 80% of error variance

🔎 Similar Papers

Questioning Internal Knowledge Structure of Large Language Models Through the Lens of the Olympic Games