Beyond Detection: A Comprehensive Benchmark and Study on Representation Learning for Fine-Grained Webshell Family Classification

📅 2025-12-04

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Malicious WebShells pose persistent threats to critical infrastructure (e.g., healthcare, finance), yet existing research focuses predominantly on binary detection, lacking automated, fine-grained classification of WebShell families. To address this gap, we present the first systematic study on WebShell family classification, proposing a novel framework that jointly models dynamic call trajectories and leverages large language model–enhanced generation to enable lineage-driven, precise incident response. We introduce the first multimodal benchmark evaluation suite tailored to this task, integrating sequential (CBOW/BERT), graph-based (Graph2Vec/GNN), and tree-structured representations. Extensive experiments across four real-world datasets demonstrate that structured representation learning significantly improves classification accuracy. Our work establishes robust baselines and advances cybersecurity from reactive detection toward proactive defense.

Technology Category

Application Category

📝 Abstract

Malicious WebShells pose a significant and evolving threat by compromising critical digital infrastructures and endangering public services in sectors such as healthcare and finance. While the research community has made significant progress in WebShell detection (i.e., distinguishing malicious samples from benign ones), we argue that it is time to transition from passive detection to in-depth analysis and proactive defense. One promising direction is the automation of WebShell family classification, which involves identifying the specific malware lineage in order to understand an adversary's tactics and enable a precise, rapid response. This crucial task, however, remains a largely unexplored area that currently relies on slow, manual expert analysis. To address this gap, we present the first systematic study to automate WebShell family classification. Our method begins with extracting dynamic function call traces to capture inherent behaviors that are resistant to common encryption and obfuscation. To enhance the scale and diversity of our dataset for a more stable evaluation, we augment these real-world traces with new variants synthesized by Large Language Models. These augmented traces are then abstracted into sequences, graphs, and trees, providing a foundation to benchmark a comprehensive suite of representation methods. Our evaluation spans classic sequence-based embeddings (CBOW, GloVe), transformers (BERT, SimCSE), and a range of structure-aware algorithms, including Graph Kernels, Graph Edit Distance, Graph2Vec, and various Graph Neural Networks. Through extensive experiments on four real-world, family-annotated datasets under both supervised and unsupervised settings, we establish a robust baseline and provide practical insights into the most effective combinations of data abstractions, representation models, and learning paradigms for this challenge.

Problem

Research questions and friction points this paper is trying to address.

Automating WebShell family classification to understand adversary tactics

Enhancing dataset diversity with LLM-synthesized variants for stable evaluation

Benchmarking representation methods for effective malware lineage identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic function call traces extraction for behavior capture

LLM-synthesized variant augmentation for dataset diversity

Multi-representation benchmarking with sequence, graph, and tree abstractions

🔎 Similar Papers

FungiTastic: A multi-modal dataset and benchmark for image categorization

2024-08-24arXiv.orgCitations: 1

💼 Related Jobs

ML/Research Engineer, Safeguards

Anthropic

$350,000—$500,000 USD

San Francisco, CA | New York City, NY / San Francisco, CA, San Francisco, California, United States

Authors to Follow