Data Augmentation for Visualization Design Knowledge Bases

๐Ÿ“… 2025-08-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing visualization design knowledge bases (e.g., Draco) suffer from incomplete training corpora and insufficient coverage of design variants, hindering systematic evaluation of design trade-offsโ€”resulting in low feature coverage and suboptimal recommendation accuracy. To address this, we propose a data augmentation method grounded in design permutation and identification of under-evaluated features, enabling automated generation of high-quality chart contrast pairs. We further introduce a scalable, multi-strategy annotation framework coupled with model-fitting-driven feature importance analysis to dynamically update and optimize knowledge base feature weights. Experimentally, we construct an expanded corpus comprising thousands of novel chart pairs and validate our approach within the Draco system: feature coverage increases by 32%, and chart recommendation accuracy improves significantly. This work marks the first systematic knowledge enhancement targeting the design trade-off space.

Technology Category

Application Category

๐Ÿ“ Abstract
Visualization knowledge bases enable computational reasoning and recommendation over a visualization design space. These systems evaluate design trade-offs using numeric weights assigned to different features (e.g., binning a variable). Feature weights can be learned automatically by fitting a model to a collection of chart pairs, in which one chart is deemed preferable to the other. To date, labeled chart pairs have been drawn from published empirical research results; however, such pairs are not comprehensive, resulting in a training corpus that lacks many design variants and fails to systematically assess potential trade-offs. To improve knowledge base coverage and accuracy, we contribute data augmentation techniques for generating and labeling chart pairs. We present methods to generate novel chart pairs based on design permutations and by identifying under-assessed features -- leading to an expanded corpus with thousands of new chart pairs, now in need of labels. Accordingly, we next compare varied methods to scale labeling efforts to annotate chart pairs, in order to learn updated feature weights. We evaluate our methods in the context of the Draco knowledge base, demonstrating improvements to both feature coverage and chart recommendation performance.
Problem

Research questions and friction points this paper is trying to address.

Enhancing visualization knowledge bases with data augmentation
Generating and labeling diverse chart pairs systematically
Improving feature coverage and recommendation accuracy in Draco
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data augmentation for generating chart pairs
Automated labeling of design permutations
Improved feature coverage and recommendation accuracy
๐Ÿ”Ž Similar Papers
No similar papers found.