đ¤ AI Summary
This study investigates whether multimodal large language models (MLLMs) can accurately perceive visualization qualityâspecifically, stress in graph layoutsâusing only visual input, without numerical computation.
Method: Adapting human cognitive experimental paradigms, we design a standardized visual assessment task, evaluating GPT-4o and Gemini 2.5 on identical layout images. We introduce a novel vision-agent-based prompting technique that bypasses conventional metric calculation and instead emulates human visual perception mechanisms.
Contribution/Results: Both models achieve human-expert-level accuracy in stress perceptionâsignificantly outperforming untrained human participantsâand generate qualitative explanations (e.g., ânodes are uniformly distributedâ, âedge lengths are consistentâ) highly aligned with human descriptions. This work provides the first empirical evidence that MLLMs can attain human-comparableâor even superhumanâvisual perception of graph layout quality, establishing a new paradigm for vision-centric evaluation of visualization aesthetics.
đ Abstract
In this paper, we test whether Multimodal Large Language Models (MLLMs) can match human-subject performance in tasks involving the perception of properties in network layouts. Specifically, we replicate a human-subject experiment about perceiving quality (namely stress) in network layouts using GPT-4o and Gemini-2.5. Our experiments show that giving MLLMs exactly the same study information as trained human participants results in a similar performance to human experts and exceeds the performance of untrained non-experts. Additionally, we show that prompt engineering that deviates from the human-subject experiment can lead to better-than-human performance in some settings. Interestingly, like human subjects, the MLLMs seem to rely on visual proxies rather than computing the actual value of stress, indicating some sense or facsimile of perception. Explanations from the models provide descriptions similar to those used by the human participants (e.g., even distribution of nodes and uniform edge lengths).