🤖 AI Summary
This study addresses the challenge of factual hallucinations in educational diagram generation by large language models (LLMs), which undermine content accuracy and user trust. To mitigate this issue, the authors introduce Rhetorical Structure Theory (RST) into the design of in-context exemplar prompts, guiding LLMs to produce diagrams that are logically coherent, well-structured, and faithful to the source text. The effectiveness of the proposed approach is validated through a combination of human evaluation by computer science educators and automated metrics, demonstrating a significant reduction in hallucination rates and improved contextual fidelity. The findings further reveal a positive correlation between diagram complexity and hallucination occurrence, and highlight LLMs’ limited ability to self-detect such errors, underscoring the critical role of structured prompting in enabling controlled and reliable diagram generation.
📝 Abstract
Generative artificial intelligence (AI) has found a widespread use in computing education; at the same time, quality of generated materials raises concerns among educators and students. This study addresses this issue by introducing a novel method for diagram code generation with in-context examples based on the Rhetorical Structure Theory (RST), which aims to improve diagram generation by aligning models'output with user expectations. Our approach is evaluated by computer science educators, who assessed 150 diagrams generated with large language models (LLMs) for logical organization, connectivity, layout aesthetic, and AI hallucination. The assessment dataset is additionally investigated for its utility in automated diagram evaluation. The preliminary results suggest that our method decreases the rate of factual hallucination and improves diagram faithfulness to provided context; however, due to LLMs'stochasticity, the quality of the generated diagrams varies. Additionally, we present an in-depth analysis and discussion on the connection between AI hallucination and the quality of generated diagrams, which reveals that text contexts of higher complexity lead to higher rates of hallucination and LLMs often fail to detect mistakes in their output.