nvBench 2.0: A Benchmark for Natural Language to Visualization under Ambiguity

📅 2025-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Natural language to visualization (NL2VIS) suffers from inherent ambiguity in user queries, leading to parsing and generation failures. Method: We introduce nvBench 2.0—the first benchmark dedicated to ambiguous-query evaluation—comprising 7,878 ambiguous queries, 24,076 visualizations, and spanning 153 domains. We propose a controllable ambiguity injection pipeline and a reverse visualization generation mechanism. Furthermore, we design Step-NL2VIS, a novel model integrating stepwise reasoning path modeling, LLM fine-tuning, and RLHF-inspired preference optimization. Contribution/Results: Step-NL2VIS is the first framework enabling evaluable, trainable, and interpretable NL2VIS under ambiguity. Experiments on nvBench 2.0 demonstrate consistent superiority over all baselines, significantly improving visualization accuracy and explanation consistency, thereby establishing a new state-of-the-art for ambiguous NL2VIS.

Technology Category

Application Category

📝 Abstract
Natural Language to Visualization (NL2VIS) enables users to create visualizations from natural language queries, making data insights more accessible. However, NL2VIS faces challenges in interpreting ambiguous queries, as users often express their visualization needs in imprecise language. To address this challenge, we introduce nvBench 2.0, a new benchmark designed to evaluate NL2VIS systems in scenarios involving ambiguous queries. nvBench 2.0 includes 7,878 natural language queries and 24,076 corresponding visualizations, derived from 780 tables across 153 domains. It is built using a controlled ambiguity-injection pipeline that generates ambiguous queries through a reverse-generation workflow. By starting with unambiguous seed visualizations and selectively injecting ambiguities, the pipeline yields multiple valid interpretations for each query, with each ambiguous query traceable to its corresponding visualization through step-wise reasoning paths. We evaluate various Large Language Models (LLMs) on their ability to perform ambiguous NL2VIS tasks using nvBench 2.0. We also propose Step-NL2VIS, an LLM-based model trained on nvBench 2.0, which enhances performance in ambiguous scenarios through step-wise preference optimization. Our results show that Step-NL2VIS outperforms all baselines, setting a new state-of-the-art for ambiguous NL2VIS tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses ambiguity in natural language to visualization queries.
Introduces nvBench 2.0 for evaluating NL2VIS systems with ambiguous queries.
Proposes Step-NL2VIS model to improve performance in ambiguous NL2VIS tasks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Controlled ambiguity-injection pipeline for query generation
Step-wise reasoning paths for ambiguous query resolution
Step-NL2VIS model with step-wise preference optimization
🔎 Similar Papers
No similar papers found.