🤖 AI Summary
This work addresses the challenges of inaccurate data understanding and hallucination in large language models when generating text from sports tabular data. To mitigate these issues, the authors propose a novel tree-structured prompting framework that guides generation through three sequential stages: content planning, subtable decomposition, and text fusion. By recursively decomposing complex tables into manageable substructures and processing them hierarchically, the method achieves state-of-the-art performance across multiple benchmarks—including ShuttleSet+, RotoWire-FG, and MLB—outperforming existing approaches in terms of relevance (RG), coherence (CO), and content selection (CS). Moreover, it reduces inference time and computational cost to approximately 40% of those required by the Chain-of-Table method, substantially improving generation accuracy, fluency, and efficiency.
📝 Abstract
Generating sports game reports from structured tables is a complex table-to-text task that demands both precise data interpretation and fluent narrative generation. Traditional model-based approaches require large, annotated datasets, while prompt-based methods using large language models (LLMs) often struggle with hallucination due to weak table comprehension. To overcome these challenges, we propose Tree-of-Text, a tree-structured prompting framework that guides LLMs through a three-stage generation process: (1) Content Planning, where relevant operations and arguments are selected from the input tables; (2) Operation Execution, which breaks down large tables into manageable sub-tables; and (3) Content Generation, where short textual outputs are merged and rewritten into a cohesive report. Experiments show that our method outperforms existing methods on ShuttleSet+, leads in RG and CO metrics on RotoWire-FG, and excels in CS and CO on MLB with roughly 40% of the time and cost of Chain-of-Table. These results demonstrate the effectiveness and efficiency of Tree-of-Text and suggest a promising direction for prompt-based table-to-text generation in the sports domain.