Beyond a Single Perspective: Text Anomaly Detection with Multi-View Language Representations

📅 2026-01-25

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing text anomaly detection methods rely on a single embedding model, limiting their generalization across diverse datasets and anomaly types. To address this limitation, this work proposes MCA², a multi-view language representation framework that integrates embeddings from multiple pre-trained language models. MCA² effectively captures inter-view complementarity through a multi-view reconstruction objective, a cross-view contrastive collaboration mechanism, and an adaptive view-weighting strategy. Extensive experiments on ten benchmark datasets demonstrate that MCA² significantly outperforms strong baselines, highlighting its superior detection performance, enhanced generalization capability, and robustness across varied anomaly scenarios.

Technology Category

Application Category

📝 Abstract

Text anomaly detection (TAD) plays a critical role in various language-driven real-world applications, including harmful content moderation, phishing detection, and spam review filtering. While two-step"embedding-detector"TAD methods have shown state-of-the-art performance, their effectiveness is often limited by the use of a single embedding model and the lack of adaptability across diverse datasets and anomaly types. To address these limitations, we propose to exploit the embeddings from multiple pretrained language models and integrate them into $MCA^2$, a multi-view TAD framework. $MCA^2$ adopts a multi-view reconstruction model to effectively extract normal textual patterns from multiple embedding perspectives. To exploit inter-view complementarity, a contrastive collaboration module is designed to leverage and strengthen the interactions across different views. Moreover, an adaptive allocation module is developed to automatically assign the contribution weight of each view, thereby improving the adaptability to diverse datasets. Extensive experiments on 10 benchmark datasets verify the effectiveness of $MCA^2$ against strong baselines. The source code of $MCA^2$ is available at https://github.com/yankehan/MCA2.

Problem

Research questions and friction points this paper is trying to address.

Text Anomaly Detection

Multi-View Representation

Embedding Model

Anomaly Adaptability

Language Models

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-view representation

text anomaly detection

contrastive collaboration

adaptive allocation

pretrained language models

🔎 Similar Papers

No similar papers found.

Authors to Follow