Human-Centered Evaluation of an LLM-Based Process Modeling Copilot: A Mixed-Methods Study with Domain Experts

📅 2026-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses a critical gap in current LLM-driven BPMN modeling tools, which largely overlook human factors and fail to meet the authentic needs of domain experts. Employing a mixed-methods approach that integrates focus groups with standardized usability questionnaires (e.g., CUQ), this work presents the first systematic evaluation of user experience and acceptance of LLM-based process modeling collaborators. The findings reveal a significant tension between moderate usability (mean score: 67.2/100) and low trust (only 48.8%), identifying reliability as a key bottleneck. Key limitations include insufficient output quality and the absence of deep follow-up questioning mechanisms. To address these challenges, the study advocates for integrating human-centered evaluation with automated benchmarking and proposes five practical application scenarios, establishing a new paradigm for trustworthy and effective AI-assisted process modeling.

Technology Category

Application Category

📝 Abstract
Integrating Large Language Models (LLMs) into business process management tools promises to democratize Business Process Model and Notation (BPMN) modeling for non-experts. While automated frameworks assess syntactic and semantic quality, they miss human factors like trust, usability, and professional alignment. We conducted a mixed-methods evaluation of our proposed solution, an LLM-powered BPMN copilot, with five process modeling experts using focus groups and standardized questionnaires. Our findings reveal a critical tension between acceptable perceived usability (mean CUQ score: 67.2/100) and notably lower trust (mean score: 48.8\%), with reliability rated as the most critical concern (M=1.8/5). Furthermore, we identified output-quality issues, prompting difficulties, and a need for the LLM to ask more in-depth clarifying questions about the process. We envision five use cases ranging from domain-expert support to enterprise quality assurance. We demonstrate the necessity of human-centered evaluation complementing automated benchmarking for LLM modeling agents.
Problem

Research questions and friction points this paper is trying to address.

Human-Centered Evaluation
Large Language Models
Business Process Modeling
Trust
Usability
Innovation

Methods, ideas, or system contributions that make the work stand out.

human-centered evaluation
LLM copilot
BPMN modeling
trust-usability tension
mixed-methods study
🔎 Similar Papers
No similar papers found.