A Split-Window Transformer for Multi-Model Sequence Spammer Detection using Multi-Model Variational Autoencoder

📅 2025-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses two key challenges in multimodal sequential spammer detection: cross-modal noise interference and memory bottlenecks in attention computation caused by ultra-long user behavioral sequences. To this end, we propose an efficient and robust modeling framework. Methodologically, it features: (1) a behavior tokenization algorithm based on a multimodal variational autoencoder (MVAE), enabling noise-robust cross-modal representation compression; and (2) a hierarchical multi-head attention mechanism—Split-Window/Sliding-Window Multi-Head Attention (SW/W-MHA)—that drastically reduces GPU memory consumption for long-sequence Transformers. Evaluated on public benchmarks, our method achieves significant performance gains over state-of-the-art approaches while using fewer parameters. It is the first to enable end-to-end, efficient modeling of ultra-long multimodal sequences. Extensive experiments validate its effectiveness and generalizability as a backbone architecture for spammer detection.

Technology Category

Application Category

📝 Abstract
This paper introduces a new Transformer, called MS$^2$Dformer, that can be used as a generalized backbone for multi-modal sequence spammer detection. Spammer detection is a complex multi-modal task, thus the challenges of applying Transformer are two-fold. Firstly, complex multi-modal noisy information about users can interfere with feature mining. Secondly, the long sequence of users' historical behaviors also puts a huge GPU memory pressure on the attention computation. To solve these problems, we first design a user behavior Tokenization algorithm based on the multi-modal variational autoencoder (MVAE). Subsequently, a hierarchical split-window multi-head attention (SW/W-MHA) mechanism is proposed. The split-window strategy transforms the ultra-long sequences hierarchically into a combination of intra-window short-term and inter-window overall attention. Pre-trained on the public datasets, MS$^2$Dformer's performance far exceeds the previous state of the art. The experiments demonstrate MS$^2$Dformer's ability to act as a backbone.
Problem

Research questions and friction points this paper is trying to address.

Detect multi-modal sequence spammers effectively
Handle noisy multi-modal user information
Reduce GPU memory in long sequence attention
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal Variational Autoencoder
Split-Window Attention Mechanism
User Behavior Tokenization Algorithm
🔎 Similar Papers
No similar papers found.
Z
Zhou Yang
School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Y
Yucai Pang
School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Hongbo Yin
Hongbo Yin
University of Leeds
CVNLP
Y
Yunpeng Xiao
School of Communications and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China