🤖 AI Summary
This paper addresses the challenge of rapidly perceiving data insights in data-rich documents. We propose the first word-granularity automated visualization generation paradigm. Our method introduces a four-module collaborative framework—Discoverer, Annotator, Extractor, and Visualizer—that tightly integrates large language models (LLMs) with visualization design knowledge to enable end-to-end extraction of data insights, semantic annotation, and rule-driven visual encoding from text. The key contribution lies in deeply embedding LLMs throughout the document-level data understanding and visualization generation pipeline, ensuring both interpretability and reading-friendliness. Technical evaluations demonstrate the robustness of each module. A user study (N=12) shows that our approach significantly improves comprehension accuracy (+5.6%), reduces mental workload (p=0.016), and decreases subjective cognitive effort (p=0.033).
📝 Abstract
Data-rich documents are ubiquitous in various applications, yet they often rely solely on textual descriptions to convey data insights. Prior research primarily focused on providing visualization-centric augmentation to data-rich documents. However, few have explored using automatically generated word-scale visualizations to enhance the document-centric reading process. As an exploratory step, we propose GistVis, an automatic pipeline that extracts and visualizes data insight from text descriptions. GistVis decomposes the generation process into four modules: Discoverer, Annotator, Extractor, and Visualizer, with the first three modules utilizing the capabilities of large language models and the fourth using visualization design knowledge. Technical evaluation including a comparative study on Discoverer and an ablation study on Annotator reveals decent performance of GistVis. Meanwhile, the user study (N=12) showed that GistVis could generate satisfactory word-scale visualizations, indicating its effectiveness in facilitating users' understanding of data-rich documents (+5.6% accuracy) while significantly reducing their mental demand (p=0.016) and perceived effort (p=0.033).