Augmenting a Large Language Model with a Combination of Text and Visual Data for Conversational Visualization of Global Geospatial Data

📅 2025-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit limited capability in scientific chart understanding—particularly in geospatial visualization question answering. To address this, we propose a lightweight, fine-tuning-free multimodal fusion framework enabling natural-language-driven, plug-and-play interactive chart QA. Our method jointly encodes visual semantics and structured data descriptions into a compact, structured textual representation, aligning multimodal features via visualization snapshot encoding and zero-shot contextual enhancement. Key contributions include: (1) the first structured compact textual representation that simultaneously captures both visual and tabular semantics of scientific charts; and (2) an integrated architecture combining multimodal feature alignment, snapshot-based visual encoding, and zero-shot context augmentation. Evaluated on GeoVista and other geovisualization benchmarks, our approach achieves state-of-the-art zero-shot performance, significantly improving both answer accuracy and interpretability in scientific visualization QA.

Technology Category

Application Category

📝 Abstract
We present a method for augmenting a Large Language Model (LLM) with a combination of text and visual data to enable accurate question answering in visualization of scientific data, making conversational visualization possible. LLMs struggle with tasks like visual data interaction, as they lack contextual visual information. We address this problem by merging a text description of a visualization and dataset with snapshots of the visualization. We extract their essential features into a structured text file, highly compact, yet descriptive enough to appropriately augment the LLM with contextual information, without any fine-tuning. This approach can be applied to any visualization that is already finally rendered, as long as it is associated with some textual description.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Image Understanding
Scientific Charts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Model
Visual-text Integration
Dialogue-based Interaction
🔎 Similar Papers
No similar papers found.
O
Omar Mena
King Abdullah University of Science and Technology (KAUST), Saudi Arabia
A
Alexandre Kouyoumdjian
King Abdullah University of Science and Technology (KAUST), Saudi Arabia
L
Lonni Besanccon
Department of Science and Technology, Linköping University, Sweden
Michael Gleicher
Michael Gleicher
Professor of Computer Sciences, University of Wisconsin - Madison
Computer GraphicsVisualizationRobotics
Ivan Viola
Ivan Viola
King Abdullah University of Science and Technology (KAUST), Saudi Arabia
computer graphicsvisualizationillustrative visualizationmolecular visualization
A
A. Ynnerman
Department of Science and Technology, Linköping University, Sweden