Exploring Interaction Paradigms for LLM Agents in Scientific Visualization

📅 2026-04-30

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

This study investigates how to effectively leverage large language model (LLM) agents to generate scientific visualization workflows from natural language instructions, balancing performance, efficiency, and flexibility. We present the first systematic evaluation of three LLM agent paradigms—domain-specific, computer-use, and general-purpose programming agents—across 15 scientific visualization benchmark tasks. The analysis examines how interaction modalities (GUI/CLI/API/MCP/scripting), tool invocation strategies, and persistent memory mechanisms influence visualization quality, efficiency, and robustness. Results show that general-purpose programming agents achieve the highest success rates but incur substantial computational overhead; domain-specific agents are efficient yet inflexible; computer-use agents struggle with long-horizon planning; and persistent memory enhances performance on repeated tasks, though its efficacy depends critically on interaction modality and feedback quality. This work offers guidance for designing next-generation intelligent visualization systems that integrate multiple agent mechanisms.

📝 Abstract

This paper examines how different types of large language model (LLM) agents perform on scientific visualization (SciVis) tasks, where users generate visualization workflows from natural-language instructions. We compare three primary interaction paradigms, including domain-specific agents with structured tool use, computer-use agents, and general-purpose coding agents, by evaluating eight representative agents across 15 benchmark tasks and measuring visualization quality, efficiency, robustness, and computational cost. We further analyze interaction modalities, including code scripts and model context protocol (MCP) or API calls for structured tool use, as well as command-line interfaces (CLI) and graphical user interfaces (GUI) for more general interaction, while additionally studying the effect of persistent memory in selected agents. The results reveal clear tradeoffs across paradigms and modalities. General-purpose coding agents achieve the highest task success rates but are computationally expensive, while domain-specific agents are more efficient and stable but less flexible. Computer-use agents perform well on individual steps but struggle with longer multi-step workflows, indicating that long-horizon planning is their primary limitation. Across both CLI- and GUI-based settings, persistent memory improves performance over repeated trials, although its benefits depend on the underlying interaction mode and the quality of feedback. These findings suggest that no single approach is sufficient, and future SciVis systems should combine structured tool use, interactive capabilities, and adaptive memory mechanisms to balance performance, robustness, and flexibility.

Problem

Research questions and friction points this paper is trying to address.

LLM agents

scientific visualization

interaction paradigms

visualization workflows

natural-language instructions

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM agents

scientific visualization

interaction paradigms