Text to Automata Diagrams: Comparing TikZ Code Generation with Direct Image Synthesis

📅 2026-03-09

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This study addresses the challenge of automatically processing student-drawn automaton diagrams for educational feedback, given their high variability in structure, layout, and correctness. The authors propose a two-stage approach: first, a vision-language model generates textual descriptions from scanned images, which are then manually corrected and fed into a large language model to produce TikZ code that compiles into standardized diagrams for fidelity evaluation. This work presents the first systematic comparison between text-mediated and direct image-based methods for automaton diagram reconstruction, highlighting the critical role of human-in-the-loop correction in enhancing pipeline accuracy. Experimental results demonstrate that even minimal manual refinement of the generated descriptions substantially improves the correctness of the resulting TikZ diagrams, offering a viable pathway toward automated grading and the generation of accessible instructional materials.

Technology Category

Application Category

📝 Abstract

Diagrams are widely used in teaching computer science courses. They are useful in subjects such as automata and formal languages, data structures, etc. These diagrams, often drawn by students during exams or assignments, vary in structure, layout, and correctness. This study examines whether current vision-language and large language models can process such diagrams and produce accurate textual and digital representations. In this study, scanned student-drawn diagrams are used as input. Then, textual descriptions are generated from these images using a vision-language model. The descriptions are checked and revised by human reviewers to make them accurate. Both the generated and the revised descriptions are then fed to a large language model to generate TikZ code. The resulting diagrams are compiled and then evaluated against the original scanned diagrams. We found descriptions generated directly from images using vision-language models are often incorrect and human correction can substantially improve the quality of vision language model generated descriptions. This research can help computer science education by paving the way for automated grading and feedback and creating more accessible instructional materials.

Problem

Research questions and friction points this paper is trying to address.

automata diagrams

diagram understanding

vision-language models

TikZ code generation

computer science education

Innovation

Methods, ideas, or system contributions that make the work stand out.

vision-language models

large language models

TikZ code generation