LLM-based Generation of Semantically Diverse and Realistic Domain Model Instances

📅 2026-04-11

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Existing approaches to domain model instantiation struggle to balance semantic fidelity and diversity, often yielding instances that lack real-world relevance and interpretability. This work proposes a novel method that leverages large language models (LLMs) for this task, integrating tailored prompting strategies, UML class diagram parsing, and model validation techniques to automatically generate instances that are syntactically correct, structurally consistent, and semantically meaningful. By moving beyond traditional approaches that focus solely on structural validity, the proposed framework significantly enhances both the semantic realism and domain-specific diversity of generated instances while preserving internal consistency.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have been recently proposed for supporting domain modeling tasks mostly related to the completion of partial models by recommending additional model elements. However, there are many more modeling tasks, one of them being the instantiation of domain models to represent concrete domain objects. While there is considerable work supporting the generation of structurally valid instantiations, there are still open challenges to incorporating real-world semantics by having realistic values contained in instances and ensuring the generation of semantically diverse models. Only then will such generated models become human-understandable and helpful in educational or data-driven research contexts. To tackle these challenges, this paper presents an approach that employs LLMs and two prompting strategies in combination with existing model validation tools for instantiating semantically realistic and diverse domain models expressed as UML class diagrams. We have applied our approach to models used in education and available in the literature from different domains and evaluated the generated instances in terms of syntactic correctness, model conformance, semantic correctness, and diversity of the generated values. The results show that the generated instances are mostly syntactically correct, that they conform to the domain model, and that there are only a few semantic errors. Moreover, the generated instance values are semantically diverse, i.e., concrete realistic examples in line with the domain and the combination of the values within one model are semantically coherent.

Problem

Research questions and friction points this paper is trying to address.

semantic diversity

realistic instances

domain modeling

model instantiation

Large Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Semantic Diversity

Domain Model Instantiation