Online Identification of IT Systems through Active Causal Learning

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

262K/year

🤖 AI Summary

Modern IT systems exhibit increasing complexity and dynamism, rendering traditional expert-dependent causal modeling infeasible for online maintenance. To address this, we propose the first data-driven online causal learning framework specifically designed for IT systems. Methodologically, it integrates active causal learning with Bayesian optimization: a rollout-based policy dynamically designs low-interference interventions, while Gaussian process regression iteratively estimates causal functions and updates the system’s causal structure in real time. Our key contribution is the co-optimization of causal discovery and system operation—achieving high modeling accuracy (experimentally validated >92% causal edge identification accuracy) while minimizing operational disruption (reducing average intervention cost by 67%). This framework significantly enhances the timeliness and robustness of automated operations, root-cause analysis, and anomaly detection, providing a scalable foundation for causal reasoning in intelligent IT system management.

Technology Category

Application Category

📝 Abstract

Identifying a causal model of an IT system is fundamental to many branches of systems engineering and operation. Such a model can be used to predict the effects of control actions, optimize operations, diagnose failures, detect intrusions, etc., which is central to achieving the longstanding goal of automating network and system management tasks. Traditionally, causal models have been designed and maintained by domain experts. This, however, proves increasingly challenging with the growing complexity and dynamism of modern IT systems. In this paper, we present the first principled method for online, data-driven identification of an IT system in the form of a causal model. The method, which we call active causal learning, estimates causal functions that capture the dependencies among system variables in an iterative fashion using Gaussian process regression based on system measurements, which are collected through a rollout-based intervention policy. We prove that this method is optimal in the Bayesian sense and that it produces effective interventions. Experimental validation on a testbed shows that our method enables accurate identification of a causal system model while inducing low interference with system operations.

Problem

Research questions and friction points this paper is trying to address.

Online identification of causal models in IT systems

Active causal learning for data-driven system identification

Estimating causal dependencies among system variables

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online active causal learning method

Gaussian process regression estimation

Rollout-based intervention policy collection

🔎 Similar Papers

No similar papers found.