Query2Diagram: Answering Developer Queries with UML Diagrams

πŸ“… 2026-04-26
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

173K/year
πŸ€– AI Summary
Existing automated UML tools often produce information-overloaded and semantically fragmented diagrams due to their inability to comprehend developer intent. This work proposes a query-driven approach to UML diagram generation by fine-tuning the Qwen2.5-Coder-14B model on a structured dataset comprising natural language queries, source code, and corresponding UML diagrams. To balance semantic focus with structural fidelity, the method incorporates a JSON-based intermediate representation and human-in-the-loop correction. Experimental results demonstrate that the proposed approach significantly outperforms existing large language model–based solutions in both semantic relevance and structural correctness, achieving the highest F1 score and the lowest defect rate. This study presents the first viable pathway toward on-demand generation of high-quality UML documentation.

Technology Category

Application Category

πŸ“ Abstract
Software documentation frequently becomes outdated or fails to exist entirely, yet developers need focused views of their codebase to understand complex systems. While automated reverse engineering tools can generate UML diagrams from code, they produce overwhelming detail without considering developer intent. We introduce query-driven UML diagram generation, where LLMs create diagrams that directly answer natural language questions about code. Unlike existing methods, our approach produces semantically focused diagrams containing only relevant elements with contextual descriptions. We fine-tune Qwen2.5-Coder-14B on a curated dataset of code files, developer queries, and corresponding diagram representations in a structured JSON format, evaluating with both automatic detection of structural defects and human assessment of semantic relevance. Results demonstrate that fine-tuning on a modest amount of manually corrected data yields dramatic improvements: our best model achieves the highest F1 scores while reducing defect rates below state-of-the-art LLMs, generating diagrams that are both structurally sound and semantically faithful to developer queries. Thus, we establish the feasibility of using LLMs for scalable contextual, on-demand documentation generation. We make our code and dataset publicly available at https://github.com/i-need-a-pencil/query2diagram.
Problem

Research questions and friction points this paper is trying to address.

software documentation
UML diagrams
developer queries
reverse engineering
code understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

query-driven UML generation
large language models
code documentation
semantic diagram synthesis
fine-tuning
πŸ”Ž Similar Papers
No similar papers found.