🤖 AI Summary
Existing web attack detection methods struggle to effectively handle irregular HTTP requests, model unordered parameters, and provide attack traceability. To address these limitations, this work proposes WADBERT, a dual-channel BERT architecture comprising URLBERT and SecBERT to separately process URL paths and payload parameters. By integrating hybrid-granularity embedding (HGE) with multi-head attention mechanisms, WADBERT fuses semantic and parameter-level features to achieve high-precision detection and precise localization of malicious parameters. Notably, this approach is the first to enable parameter-level identification of malicious features and support attack attribution. Experimental results demonstrate state-of-the-art performance, achieving F1 scores of 99.63% on CSIC2010 and 99.50% on SR-BH2020, significantly outperforming existing methods.
📝 Abstract
Web attack detection is the first line of defense for securing web applications, designed to preemptively identify malicious activities. Deep learning-based approaches are increasingly popular for their advantages: automatically learning complex patterns and extracting semantic features from HTTP requests to achieve superior detection performance. However, existing methods are less effective in embedding irregular HTTP requests, even failing to model unordered parameters and achieve attack traceability. In this paper, we propose an effective web attack detection model, named WADBERT. It achieves high detection accuracy while enabling the precise identification of malicious parameters. To this end, we first employ Hybrid Granularity Embedding (HGE) to generate fine-grained embeddings for URL and payload parameters. Then, URLBERT and SecBERT are respectively utilized to extract their semantic features. Further, parameter-level features (extracted by SecBERT) are fused through a multi-head attention mechanism, resulting in a comprehensive payload feature. Finally, by feeding the concatenated URL and payload features into a linear classifier, a final detection result is obtained. The experimental results on CSIC2010 and SR-BH2020 datasets validate the efficacy of WADBERT, which respectively achieves F1-scores of 99.63% and 99.50%, and significantly outperforms state-of-the-art methods.