Accurate and Consistent Graph Model Generation from Text with Large Language Models

📅 2025-07-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three critical challenges in generating graph models from natural language using large language models (LLMs): syntactic violations (deviations from the metamodel), constraint inconsistencies (violations of domain-specific rules), and content hallucinations (introduction of spurious elements). To tackle these, we propose an “abstraction–concretization” framework: first, aggregating outputs from multiple LLMs to construct a probabilistic partial model; then, refining it into a complete, constraint-compliant graph model via a constraint-driven optimization process. The method integrates probabilistic modeling, symbolic constraint reasoning, and self-consistency-based ensemble mechanisms. Extensive experiments across diverse open- and closed-source LLMs demonstrate substantial improvements in syntactic correctness, constraint adherence, and semantic fidelity—effectively suppressing structural errors and hallucinations. Our approach establishes a verifiable, constraint-governed generation paradigm for model-driven engineering powered by LLMs.

Technology Category

Application Category

📝 Abstract
Graph model generation from natural language description is an important task with many applications in software engineering. With the rise of large language models (LLMs), there is a growing interest in using LLMs for graph model generation. Nevertheless, LLM-based graph model generation typically produces partially correct models that suffer from three main issues: (1) syntax violations: the generated model may not adhere to the syntax defined by its metamodel, (2) constraint inconsistencies: the structure of the model might not conform to some domain-specific constraints, and (3) inaccuracy: due to the inherent uncertainty in LLMs, the models can include inaccurate, hallucinated elements. While the first issue is often addressed through techniques such as constraint decoding or filtering, the latter two remain largely unaddressed. Motivated by recent self-consistency approaches in LLMs, we propose a novel abstraction-concretization framework that enhances the consistency and quality of generated graph models by considering multiple outputs from an LLM. Our approach first constructs a probabilistic partial model that aggregates all candidate outputs and then refines this partial model into the most appropriate concrete model that satisfies all constraints. We evaluate our framework on several popular open-source and closed-source LLMs using diverse datasets for model generation tasks. The results demonstrate that our approach significantly improves both the consistency and quality of the generated graph models.
Problem

Research questions and friction points this paper is trying to address.

Address syntax violations in LLM-generated graph models
Resolve constraint inconsistencies in generated graph structures
Reduce inaccuracies and hallucinations in model elements
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses probabilistic partial model aggregation
Refines models to satisfy all constraints
Enhances consistency with multiple LLM outputs
B
Boqi Chen
Electrical and Computer Engineering, McGill University, Canada
O
Ou Wei
Huawei Research Canada, Canada
B
Bingzhou Zheng
Huawei Research Canada, Canada
Gunter Mussbacher
Gunter Mussbacher
Associate Professor, McGill University
Requirements Eng.Model-Driven Eng.Software Language Eng.Human ValuesSustainability