LLM Code Customization with Visual Results: A Benchmark on TikZ

📅 2025-05-07

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

This work addresses the challenge of ensuring visual consistency in natural language–driven code customization for large language models (LLMs), particularly when generating TikZ graphics. To this end, we introduce vTikZ—the first vision-code alignment benchmark specifically designed for TikZ. Methodologically, we propose a novel evaluation paradigm that integrates visual feedback, combining parameterized ground-truth generation, visualization-based discrepancy comparison, and a human-in-the-loop verification framework. Our key contributions are threefold: (1) the first systematic assessment of LLMs across three interdependent objectives—feature localization, semantically coherent modification, and visual alignment; (2) the first quantitative metric bridging intent-to-visual consistency; and (3) open-sourcing of a visualization auditing tool and a reproducible test suite. Empirical evaluation reveals that state-of-the-art LLMs achieve only <32% average accuracy on vTikZ, underscoring the task’s difficulty and establishing vTikZ as a foundational benchmark and methodological resource for vision-guided code editing research.

Technology Category

Application Category

📝 Abstract

With the rise of AI-based code generation, customizing existing code out of natural language instructions to modify visual results -such as figures or images -has become possible, promising to reduce the need for deep programming expertise. However, even experienced developers can struggle with this task, as it requires identifying relevant code regions (feature location), generating valid code variants, and ensuring the modifications reliably align with user intent. In this paper, we introduce vTikZ, the first benchmark designed to evaluate the ability of Large Language Models (LLMs) to customize code while preserving coherent visual outcomes. Our benchmark consists of carefully curated vTikZ editing scenarios, parameterized ground truths, and a reviewing tool that leverages visual feedback to assess correctness. Empirical evaluation with stateof-the-art LLMs shows that existing solutions struggle to reliably modify code in alignment with visual intent, highlighting a gap in current AI-assisted code editing approaches. We argue that vTikZ opens new research directions for integrating LLMs with visual feedback mechanisms to improve code customization tasks in various domains beyond TikZ, including image processing, art creation, Web design, and 3D modeling.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' ability to customize code for visual results

Identifying relevant code regions and generating valid variants

Ensuring code modifications align with user visual intent

Innovation

Methods, ideas, or system contributions that make the work stand out.

vTikZ benchmark for LLM code customization evaluation

Visual feedback mechanism to assess code correctness

Parameterized ground truths for TikZ editing scenarios

🔎 Similar Papers

No similar papers found.