Validating Threat Modeling Results with the Help of Vulnerable Test Applications

📅 2026-05-22

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This study addresses the lack of objective validation criteria in existing threat modeling approaches, which often rely on expert judgment and are thus prone to omissions or inconsistencies. To overcome this limitation, the authors propose a quantifiable and reproducible evaluation methodology based on benchmark applications with known vulnerabilities—specifically AzureGoat and VulnBank. Using only architectural diagrams, data flow diagrams, and their textual descriptions as input, the approach evaluates the vulnerability coverage of ThreMoLIA, an LLM-assisted threat modeling system, against Microsoft Threat Modeling Tool. Experimental results demonstrate that ThreMoLIA achieves consistently higher vulnerability coverage across both benchmark applications. This work represents the first effort to employ real-world vulnerable applications as a validation benchmark for threat modeling, effectively mitigating the shortcomings inherent in traditional expert-based assessments.

📝 Abstract

Validating threat modeling results remains difficult because completeness is hard to judge without an external oracle. Existing studies often rely on expert-produced reference models and other human baselines, but these can contain omissions or disagreements. This paper evaluates a complementary, vulnerability-grounded validation approach. We apply threat modeling to intentionally vulnerable applications with a known vulnerability set to measure the number of related vulnerabilities that can be discovered. We compare ThreMoLIA, an LLM-assisted threat modeling solution developed by our team, with the Microsoft Threat Modeling Tool (MTMT) across two vulnerable applications: AzureGoat and the Vulnerable Bank Application (VulnBank). The inputs to both tools are limited to architecture, data flow diagrams, and their descriptions. The results show that ThreMoLIA achieved higher vulnerability coverage on both systems. We show that vulnerable test applications provide a practical benchmark for assessing threat coverage and complement expert-based validation.

Problem

Research questions and friction points this paper is trying to address.

threat modeling validation

vulnerability coverage

test applications

completeness assessment

security evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

vulnerability-grounded validation

threat modeling

LLM-assisted security analysis