Integrating Domain Knowledge into Process Discovery Using Large Language Models

📅 2025-10-08

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

Event logs frequently contain noise and missing entries, while conventional process discovery methods neglect domain knowledge, resulting in biased models and low downstream reliability. To address this, we propose the first interactive process discovery framework integrating large language models (LLMs). Our approach employs prompt engineering to elicit declarative process rules from expert-provided natural language descriptions; these rules are jointly processed with event logs by an enhanced Inductive Miner revised (IMr) algorithm to recursively construct process models that balance accuracy and interpretability. The system enables real-time expert feedback and iterative rule refinement. Empirical evaluation demonstrates substantial improvements in model adaptability and structural soundness. Expert assessments confirm high usability and practical deployability, validating the framework’s effectiveness in bridging domain expertise with automated process discovery.

Technology Category

Application Category

📝 Abstract

Process discovery aims to derive process models from event logs, providing insights into operational behavior and forming a foundation for conformance checking and process improvement. However, models derived solely from event data may not accurately reflect the real process, as event logs are often incomplete or affected by noise, and domain knowledge, an important complementary resource, is typically disregarded. As a result, the discovered models may lack reliability for downstream tasks. We propose an interactive framework that incorporates domain knowledge, expressed in natural language, into the process discovery pipeline using Large Language Models (LLMs). Our approach leverages LLMs to extract declarative rules from textual descriptions provided by domain experts. These rules are used to guide the IMr discovery algorithm, which recursively constructs process models by combining insights from both the event log and the extracted rules, helping to avoid problematic process structures that contradict domain knowledge. The framework coordinates interactions among the LLM, domain experts, and a set of backend services. We present a fully implemented tool that supports this workflow and conduct an extensive evaluation of multiple LLMs and prompt engineering strategies. Our empirical study includes a case study based on a real-life event log with the involvement of domain experts, who assessed the usability and effectiveness of the framework.

Problem

Research questions and friction points this paper is trying to address.

Process discovery models often ignore valuable domain knowledge from experts

Event logs alone produce unreliable models due to incompleteness and noise

Current methods lack integration of natural language domain constraints

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrating domain knowledge via Large Language Models

Extracting declarative rules from expert textual descriptions

Guiding process discovery with combined log and rule insights

🔎 Similar Papers

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval