🤖 AI Summary
Detecting zero-day malicious domains hosted on low-reputation infrastructure is challenging due to the co-residence of benign and malicious domains, leading to high false positives and delayed detection. Method: We propose a content-agnostic dynamic graph modeling approach: (1) constructing a temporal network graph based on domain-to-infrastructure hosting relationships to capture infrastructure reuse patterns; and (2) designing a lightweight temporal graph neural network for modeling anomalous hosting behavior with online inference capability. Contribution/Results: This work pioneers infrastructure-level hosting as the core detection dimension, enabling high-precision discrimination between malicious and co-resident benign domains. Experiments show 99.7% accuracy, 86.9% recall, and only 0.1% false positive rate. Our method detects ~19,000 new malicious domains daily—over five times more than VirusTotal—with predictions up to days or weeks in advance, supporting both on-demand prediction and bulk blocking.
📝 Abstract
Internet miscreants increasingly utilize short-lived disposable domains to launch various attacks. Existing detection mechanisms are either too late to catch such malicious domains due to limited information and their short life spans or unable to catch them due to evasive techniques such as cloaking and captcha. In this work, we investigate the possibility of detecting malicious domains early in their life cycle using a content-agnostic approach. We observe that attackers often reuse or rotate hosting infrastructures to host multiple malicious domains due to increased utilization of automation and economies of scale. Thus, it gives defenders the opportunity to monitor such infrastructure to identify newly hosted malicious domains. However, such infrastructures are often shared hosting environments where benign domains are also hosted, which could result in a prohibitive number of false positives. Therefore, one needs innovative mechanisms to better distinguish malicious domains from benign ones even when they share hosting infrastructures. In this work, we build MANTIS, a highly accurate practical system that not only generates daily blocklists of malicious domains but also is able to predict malicious domains on-demand. We design a network graph based on the hosting infrastructure that is accurate and generalizable over time. Consistently, our models achieve a precision of 99.7%, a recall of 86.9% with a very low false positive rate (FPR) of 0.1% and on average detects 19K new malicious domains per day, which is over 5 times the new malicious domains flagged daily in VirusTotal. Further, MANTIS predicts malicious domains days to weeks before they appear in popular blocklists.