Beyond a Single Perspective: Text Anomaly Detection with Multi-View Language Representations

📅 2026-01-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing text anomaly detection methods rely on a single embedding model, limiting their generalization across diverse datasets and anomaly types. To address this limitation, this work proposes MCA², a multi-view language representation framework that integrates embeddings from multiple pre-trained language models. MCA² effectively captures inter-view complementarity through a multi-view reconstruction objective, a cross-view contrastive collaboration mechanism, and an adaptive view-weighting strategy. Extensive experiments on ten benchmark datasets demonstrate that MCA² significantly outperforms strong baselines, highlighting its superior detection performance, enhanced generalization capability, and robustness across varied anomaly scenarios.

Technology Category

Application Category

📝 Abstract
Text anomaly detection (TAD) plays a critical role in various language-driven real-world applications, including harmful content moderation, phishing detection, and spam review filtering. While two-step"embedding-detector"TAD methods have shown state-of-the-art performance, their effectiveness is often limited by the use of a single embedding model and the lack of adaptability across diverse datasets and anomaly types. To address these limitations, we propose to exploit the embeddings from multiple pretrained language models and integrate them into $MCA^2$, a multi-view TAD framework. $MCA^2$ adopts a multi-view reconstruction model to effectively extract normal textual patterns from multiple embedding perspectives. To exploit inter-view complementarity, a contrastive collaboration module is designed to leverage and strengthen the interactions across different views. Moreover, an adaptive allocation module is developed to automatically assign the contribution weight of each view, thereby improving the adaptability to diverse datasets. Extensive experiments on 10 benchmark datasets verify the effectiveness of $MCA^2$ against strong baselines. The source code of $MCA^2$ is available at https://github.com/yankehan/MCA2.
Problem

Research questions and friction points this paper is trying to address.

Text Anomaly Detection
Multi-View Representation
Embedding Model
Anomaly Adaptability
Language Models
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-view representation
text anomaly detection
contrastive collaboration
adaptive allocation
pretrained language models
🔎 Similar Papers
No similar papers found.
Yixin Liu
Yixin Liu
Research Fellow, School of ICT, Griffith University
Graph Neural NetworksGraph Anomaly DetectionLLM Agents
K
Kehan Yan
School of Computer, Electronics and Information, Guangxi University, China
S
Shiyuan Li
School of Information and Communication Technology, Griffith University, Australia
Q
Qingfeng Chen
School of Computer, Electronics and Information, Guangxi University, China
Shirui Pan
Shirui Pan
Professor, ARC Future Fellow, FQA, Director of TrustAGI Lab, Griffith University
Data MiningMachine LearningGraph Neural NetworksTrustworthy AITime Series