🤖 AI Summary
This study addresses the disconnection between unstructured incident narratives and economic loss quantification in chemical industry fire risk assessment. We propose a risk index construction method that jointly models accident text descriptions and property loss data. Methodologically, we integrate LDA topic modeling with network embedding to extract causal semantic clusters—such as hazardous chemical leakage, unsafe storage, and equipment failure—and employ Lasso regression to quantify their statistically significant associations with financial losses, yielding an interpretable, context-aware financial risk index. Our contribution is twofold: (1) the first systematic integration of semantic modeling and sparse regression for fire risk quantification, enabling end-to-end, interpretable mapping from textual evidence to financial impact; and (2) empirical validation demonstrating superior discriminative power and practical utility over conventional statistical metrics in high-risk factor identification and loss prediction.
📝 Abstract
Fire incident reports contain detailed textual narratives that capture causal factors often overlooked in structured records, while financial damage amounts provide measurable outcomes of these events. Integrating these two sources of information is essential for uncovering interpretable links between descriptive causes and their economic consequences. To this end, we develop a data-driven framework that constructs a composite Risk Index, enabling systematic quantification of how specific keywords relate to property damage amounts. This index facilitates both the identification of high-impact terms and the aggregation of risks across semantically related clusters, thereby offering a principled measure of fire-related financial risk. Using more than a decade of Korean fire investigation reports on the chemical industry classified as Special Buildings (2013 through 2024), we employ topic modeling and network-based embedding to estimate semantic similarities from interactions among words and subsequently apply Lasso regression to quantify their associations with property damage amounts, thereby estimate fire risk index. This approach enables us to assess fire risk not only at the level of individual terms but also within their broader textual context, where highly interactive related words provide insights into collective patterns of hazard representation and their potential impact on expected losses. The analysis highlights several domains of risk, including hazardous chemical leakage, unsafe storage practices, equipment and facility malfunctions, and environmentally induced ignition. The results demonstrate that text-derived indices provide interpretable and practically relevant insights, bridging unstructured narratives with structured loss information and offering a basis for evidence-based fire risk assessment and management.