🤖 AI Summary
This work addresses the challenge of generating semantically accurate scientific architecture diagrams from natural language, a task hindered by the absence of high-quality, large-scale open datasets. To bridge this gap, the authors introduce the first large-scale open-source dataset specifically designed for this purpose, comprising scientific architecture diagrams, their corresponding textual descriptions, and associated DOT code representations. Leveraging this dataset, they fine-tune compact language models or employ GPT-4o with in-context learning to achieve high-fidelity text-to-diagram generation. Experimental results demonstrate that the fine-tuned small models match the performance of GPT-4o and significantly outperform baseline approaches such as DiagramAgent. The code, dataset, and trained models are publicly released to facilitate further research.
📝 Abstract
Communicating complex system designs or scientific processes through text alone is inefficient and prone to ambiguity. A system that automatically generates scientific architecture diagrams from text with high semantic fidelity can be useful in multiple applications like enterprise architecture visualization, AI-driven software design, and educational content creation. Hence, in this paper, we focus on leveraging language models to perform semantic understanding of the input text description to generate intermediate code that can be processed to generate high-fidelity architecture diagrams. Unfortunately, no clean large-scale open-access dataset exists, implying lack of any effective open models for this task. Hence, we contribute a comprehensive dataset, \system, comprising scientific architecture images, their corresponding textual descriptions, and associated DOT code representations. Leveraging this resource, we fine-tune a suite of small language models, and also perform in-context learning using GPT-4o. Through extensive experimentation, we show that \system{} models significantly outperform existing baseline models like DiagramAgent and perform at par with in-context learning-based generations from GPT-4o. We make the code, data and models publicly available.