🤖 AI Summary
This work addresses the limitations of traditional network visualization methods, which rely on heuristic metrics like stress and often fail to produce layouts consistently aligned with human aesthetic preferences. While data-driven approaches based on human preferences offer promise, they are hindered by high annotation costs and limited dataset scale. To overcome these challenges, the authors conduct a large-scale user study to collect human preference labels and propose a novel, scalable aesthetic proxy by integrating large language models (LLMs) with vision models (VMs). Through multimodal prompt engineering, image embedding fusion, and confidence-based filtering, the LLM achieves significantly improved alignment with human judgments across diverse inputs, reaching inter-human agreement levels after filtering. The VM demonstrates comparable alignment, collectively validating the effectiveness and novelty of the proposed framework.
📝 Abstract
Network visualization has traditionally relied on heuristic metrics, such as stress, under the assumption that optimizing them leads to aesthetic and informative layouts. However, no single metric consistently produces the most effective results. A data-driven alternative is to learn from human preferences, where annotators select their favored visualization among multiple layouts of the same graphs. These human-preference labels can then be used to train a generative model that approximates human aesthetic preferences. However, obtaining human labels at scale is costly and time-consuming. As a result, this generative approach has so far been tested only with machine-labeled data. In this paper, we explore the use of large language models (LLMs) and vision models (VMs) as proxies for human judgment. Through a carefully designed user study involving 27 participants, we curated a large set of human preference labels. We used this data both to better understand human preferences and to bootstrap LLM/VM labelers. We show that prompt engineering that combines few-shot examples and diverse input formats, such as image embeddings, significantly improves LLM-human alignment, and additional filtering by the confidence score of the LLM pushes the alignment to human-human levels. Furthermore, we demonstrate that carefully trained VMs can achieve VM-human alignment at a level comparable to that between human annotators. Our results suggest that AI can feasibly serve as a scalable proxy for human labelers.