DNS-GT: A Graph-based Transformer Approach to Learn Embeddings of Domain Names from DNS Queries

๐Ÿ“… 2026-03-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of existing machine learningโ€“based network intrusion detection methods that rely heavily on large volumes of labeled data and struggle to effectively capture contextual relationships among domain names when leveraging DNS query logs. To overcome these challenges, the authors propose DNS-GT, a novel self-supervised model that integrates graph structures with a Transformer architecture. DNS-GT is the first to apply a graph-augmented Transformer for modeling DNS query sequences, enabling pretraining to learn domain name embeddings rich in temporal and semantic dependencies, which can subsequently be fine-tuned for downstream tasks. Experimental results demonstrate that DNS-GT significantly outperforms current baselines in both domain classification and botnet detection, confirming the strong generalization capability of its learned representations and highlighting its practical potential for real-world cybersecurity applications.

Technology Category

Application Category

๐Ÿ“ Abstract
Network intrusion detection systems play a crucial role in the security strategy employed by organisations to detect and prevent cyberattacks. Such systems usually combine pattern detection signatures with anomaly detection techniques powered by machine learning methods. However, the commonly proposed machine learning methods present drawbacks such as over-reliance on labeled data and limited generalization capabilities. To address these issues, embedding-based methods have been introduced to learn representations from network data, such as DNS traffic, mainly due to its large availability, that generalise effectively to many downstream tasks. However, current approaches do not properly consider contextual information among DNS queries. In this paper, we tackle this issue by proposing DNS-GT, a novel Transformer-based model that learns embeddings for domain names from sequences of DNS queries. The model is first pre-trained in a self-supervised fashion in order to learn the general behavior of DNS activity. Then, it can be finetuned on specific downstream tasks, exploiting interactions with other relevant queries in a given sequence. Our experiments with real-world DNS data showcase the ability of our method to learn effective domain name representations. A quantitative evaluation on domain name classification and botnet detection tasks shows that our approach achieves better results compared to relevant baselines, creating opportunities for further exploration of large-scale language models for intrusion detection systems. Our code is available at: https://github.com/m-altieri/DNS-GT.
Problem

Research questions and friction points this paper is trying to address.

DNS queries
contextual information
domain name embeddings
network intrusion detection
representation learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-based Transformer
DNS embedding
self-supervised learning
contextual modeling
intrusion detection
๐Ÿ”Ž Similar Papers
No similar papers found.
M
Massimiliano Altieri
Joint Research Centre, European Commission, Ispra, 21027, Italy.
R
Ronan Hamon
Joint Research Centre, European Commission, Ispra, 21027, Italy.
Roberto Corizzo
Roberto Corizzo
American University
Data MiningBig DataContinual LearningMachine Learning
Michelangelo Ceci
Michelangelo Ceci
University of Bari "A. Moro"
Artificial IntelligenceMachine LearningKnowledge DiscoveryData Mining
I
Ignacio Sanchez
Joint Research Centre, European Commission, Ispra, 21027, Italy.