π€ AI Summary
In visual analytics, large language models (LLMs) struggle to model the continuous semantic interactions between users and visual workspaces, leading to imprecise iterative refinement of analytical reports. To address this, we propose VIS-ReActβa novel dual-agent framework. The Analysis Agent interprets visual-semantic interactions, infers user intent, and devises refinement strategies; the Refinement Agent performs progressive text generation and report updating. VIS-ReAct tightly integrates visual workspace representation, intent-driven reasoning, and explainable text generation. Experimental results demonstrate that our approach significantly outperforms baselines in refinement accuracy, semantic fidelity, and reasoning transparency. It thus enhances both precision and interpretability in human-AI collaborative report writing.
π Abstract
Sensemaking report writing often requires multiple refinements in the iterative process. While Large Language Models (LLMs) have shown promise in generating initial reports based on human visual workspace representations, they struggle to precisely incorporate sequential semantic interactions during the refinement process. We introduce VIS-ReAct, a framework that reasons about newly-added semantic interactions in visual workspaces to steer the LLM for report refinement.
VIS-ReAct is a two-agent framework: a primary LLM analysis agent interprets new semantic interactions to infer user intentions and generate refinement planning, followed by an LLM refinement agent that updates reports accordingly. Through case study, VIS-ReAct outperforms baseline and VIS-ReAct (without LLM analysis) on targeted refinement, semantic fidelity, and transparent inference. Results demonstrate that VIS-ReAct better handles various interaction types and granularities while enhancing the transparency of human-LLM collaboration.