Augmenting a Large Language Model with a Combination of Text and Visual Data for Conversational Visualization of Global Geospatial Data

📅 2025-01-16

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

Large language models (LLMs) exhibit limited capability in scientific chart understanding—particularly in geospatial visualization question answering. To address this, we propose a lightweight, fine-tuning-free multimodal fusion framework enabling natural-language-driven, plug-and-play interactive chart QA. Our method jointly encodes visual semantics and structured data descriptions into a compact, structured textual representation, aligning multimodal features via visualization snapshot encoding and zero-shot contextual enhancement. Key contributions include: (1) the first structured compact textual representation that simultaneously captures both visual and tabular semantics of scientific charts; and (2) an integrated architecture combining multimodal feature alignment, snapshot-based visual encoding, and zero-shot context augmentation. Evaluated on GeoVista and other geovisualization benchmarks, our approach achieves state-of-the-art zero-shot performance, significantly improving both answer accuracy and interpretability in scientific visualization QA.

Technology Category

Application Category

📝 Abstract

We present a method for augmenting a Large Language Model (LLM) with a combination of text and visual data to enable accurate question answering in visualization of scientific data, making conversational visualization possible. LLMs struggle with tasks like visual data interaction, as they lack contextual visual information. We address this problem by merging a text description of a visualization and dataset with snapshots of the visualization. We extract their essential features into a structured text file, highly compact, yet descriptive enough to appropriately augment the LLM with contextual information, without any fine-tuning. This approach can be applied to any visualization that is already finally rendered, as long as it is associated with some textual description.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Image Understanding

Scientific Charts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Model

Visual-text Integration

Dialogue-based Interaction

🔎 Similar Papers

No similar papers found.