ViStruct: Simulating Expert-Like Reasoning Through Task Decomposition and Visual Attention Cues

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Visual understanding requires multi-step reasoning and spatial attention, yet expert strategies for chart interpretation have long lacked explicit computational modeling. Method: We propose ViStruct—the first interpretable framework that explicitly simulates expert task decomposition and selective region attention in visualization analysis. It parses complex visual questions into structured analytical steps and generates semantically grounded attention cues by detecting chart components and mapping them to spatial subtasks. The approach integrates large language models, vision-language models, and spatial reasoning techniques. Contribution/Results: Evaluated across 45 visualization tasks spanning 12 categories, ViStruct’s outputs are consistently rated by professional visualization practitioners as both interpretable and aligned with expert cognitive logic. It establishes a reproducible, expert-level explanatory paradigm for visual literacy tool development.

Technology Category

Application Category

📝 Abstract

Data visualization tasks often require multi-step reasoning, and the interpretive strategies experts use, such as decomposing complex goals into smaller subtasks and selectively attending to key chart regions are rarely made explicit. ViStruct is an automated pipeline that simulates these expert behaviours by breaking high-level visual questions into structured analytic steps and highlighting semantically relevant chart areas. Leveraging large language and vision-language models, ViStruct identifies chart components, maps subtasks to spatial regions, and presents visual attention cues to externalize expert-like reasoning flows. While not designed for direct novice instruction, ViStruct provides a replicable model of expert interpretation that can inform the development of future visual literacy tools. We evaluate the system on 45 tasks across 12 chart types and validate its outputs with trained visualization users, confirming its ability to produce interpretable and expert-aligned reasoning sequences.

Problem

Research questions and friction points this paper is trying to address.

Simulate expert reasoning in data visualization tasks

Decompose complex goals into structured analytic steps

Highlight key chart regions using visual attention cues

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes tasks into structured analytic steps

Highlights key chart regions with attention cues

Uses LLMs and VLMs to map subtasks spatially

🔎 Similar Papers

Tackling the Abstraction and Reasoning Corpus with Vision Transformers: the Importance of 2D Representation, Positions, and Objects