🤖 AI Summary
This study addresses the challenge of insufficient user representation in evaluating visualization systems within expert-scarce domains such as genomics. It presents the first systematic investigation into the effectiveness of large language model–generated synthetic personas in specialized visualization evaluation. Through three controlled experiments, the authors compare feedback on the multimodal genomic search engine Geranium from unconstrained synthetic personas, constraint-based synthetic personas grounded in real-user corpora, and actual domain experts. Findings indicate that constraint-based synthetic personas better emulate real users in linguistic style and focus, yet both synthetic approaches fail to capture experts’ preferences for visual modalities and predominantly adopt a “search-and-adjust” interaction paradigm. The work proposes that synthetic personas should serve as a complementary—rather than substitutive—tool, offering a novel paradigm for visualization evaluation in specialized domains.
📝 Abstract
Evaluating visualization systems in niche domains such as genomics is challenging due to scarcity of domain experts and difficulty recruiting a representative user base. While LLM-based synthetic personas are increasingly used to ease evaluation bottlenecks, they face well-founded skepticism. Rather than weighing synthetic personas as substitutes for real users, we ask a fundamental open question: when synthetic personas evaluate a real visualization system, what do they actually produce, and how does that output change when grounded in documented human contexts? We present Sycamore, an exploratory three-condition probe design using Geranium, a search engine for multimodal genomics visualization, as a case study. Sycamore evaluates Geranium using: (1) ungrounded synthetic personas from generic LLM priors; (2) grounded synthetic personas constrained by voice-of-customer artifacts from a prior interview study; and (3) a published baseline study of real domain experts. We observe that grounding shifts synthetic feedback toward the language and concerns of documented users, while ungrounded evaluators drift toward operational specifics that real participants did not raise; both synthetic conditions, however, converge on a find-and-adapt frame and miss the image-modality preference observed in the expert study. We discuss what these observations imply for where synthetic personas might fit alongside expert studies in domain-specific visualization evaluation. All supplemental materials are available at https://osf.io/kdfr3/.