Exploring MLLMs Perception of Network Visualization Principles

📅 2025-06-17
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether multimodal large language models (MLLMs) can accurately perceive visualization quality—specifically, stress in graph layouts—using only visual input, without numerical computation. Method: Adapting human cognitive experimental paradigms, we design a standardized visual assessment task, evaluating GPT-4o and Gemini 2.5 on identical layout images. We introduce a novel vision-agent-based prompting technique that bypasses conventional metric calculation and instead emulates human visual perception mechanisms. Contribution/Results: Both models achieve human-expert-level accuracy in stress perception—significantly outperforming untrained human participants—and generate qualitative explanations (e.g., “nodes are uniformly distributed”, “edge lengths are consistent”) highly aligned with human descriptions. This work provides the first empirical evidence that MLLMs can attain human-comparable—or even superhuman—visual perception of graph layout quality, establishing a new paradigm for vision-centric evaluation of visualization aesthetics.

Technology Category

Application Category

📝 Abstract
In this paper, we test whether Multimodal Large Language Models (MLLMs) can match human-subject performance in tasks involving the perception of properties in network layouts. Specifically, we replicate a human-subject experiment about perceiving quality (namely stress) in network layouts using GPT-4o and Gemini-2.5. Our experiments show that giving MLLMs exactly the same study information as trained human participants results in a similar performance to human experts and exceeds the performance of untrained non-experts. Additionally, we show that prompt engineering that deviates from the human-subject experiment can lead to better-than-human performance in some settings. Interestingly, like human subjects, the MLLMs seem to rely on visual proxies rather than computing the actual value of stress, indicating some sense or facsimile of perception. Explanations from the models provide descriptions similar to those used by the human participants (e.g., even distribution of nodes and uniform edge lengths).
Problem

Research questions and friction points this paper is trying to address.

Test MLLMs' ability to perceive network layout properties like humans
Compare MLLM and human performance in network visualization tasks
Explore MLLMs' reliance on visual proxies for stress perception
Innovation

Methods, ideas, or system contributions that make the work stand out.

MLLMs match human performance in network visualization
Prompt engineering enhances MLLMs beyond human capability
MLLMs use visual proxies similar to humans
🔎 Similar Papers
No similar papers found.