Accurate and Consistent Graph Model Generation from Text with Large Language Models

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses three critical challenges in generating graph models from natural language using large language models (LLMs): syntactic violations (deviations from the metamodel), constraint inconsistencies (violations of domain-specific rules), and content hallucinations (introduction of spurious elements). To tackle these, we propose an “abstraction–concretization” framework: first, aggregating outputs from multiple LLMs to construct a probabilistic partial model; then, refining it into a complete, constraint-compliant graph model via a constraint-driven optimization process. The method integrates probabilistic modeling, symbolic constraint reasoning, and self-consistency-based ensemble mechanisms. Extensive experiments across diverse open- and closed-source LLMs demonstrate substantial improvements in syntactic correctness, constraint adherence, and semantic fidelity—effectively suppressing structural errors and hallucinations. Our approach establishes a verifiable, constraint-governed generation paradigm for model-driven engineering powered by LLMs.

Technology Category

Application Category

📝 Abstract

Graph model generation from natural language description is an important task with many applications in software engineering. With the rise of large language models (LLMs), there is a growing interest in using LLMs for graph model generation. Nevertheless, LLM-based graph model generation typically produces partially correct models that suffer from three main issues: (1) syntax violations: the generated model may not adhere to the syntax defined by its metamodel, (2) constraint inconsistencies: the structure of the model might not conform to some domain-specific constraints, and (3) inaccuracy: due to the inherent uncertainty in LLMs, the models can include inaccurate, hallucinated elements. While the first issue is often addressed through techniques such as constraint decoding or filtering, the latter two remain largely unaddressed. Motivated by recent self-consistency approaches in LLMs, we propose a novel abstraction-concretization framework that enhances the consistency and quality of generated graph models by considering multiple outputs from an LLM. Our approach first constructs a probabilistic partial model that aggregates all candidate outputs and then refines this partial model into the most appropriate concrete model that satisfies all constraints. We evaluate our framework on several popular open-source and closed-source LLMs using diverse datasets for model generation tasks. The results demonstrate that our approach significantly improves both the consistency and quality of the generated graph models.

Problem

Research questions and friction points this paper is trying to address.

Address syntax violations in LLM-generated graph models

Resolve constraint inconsistencies in generated graph structures

Reduce inaccuracies and hallucinations in model elements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses probabilistic partial model aggregation

Refines models to satisfy all constraints

Enhances consistency with multiple LLM outputs

🔎 Similar Papers

Revisiting the Graph Reasoning Ability of Large Language Models: Case Studies in Translation, Connectivity and Shortest Path