Language-Assisted Feature Transformation for Anomaly Detection

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

148K/year

🤖 AI Summary

In unsupervised anomaly detection, conventional methods struggle to precisely model the normal boundary due to data scarcity and confounding attributes, and fail to identify user-specified anomalies. To address this, we propose a language-guided feature transformation framework—the first to leverage the cross-modal shared embedding space of vision-language models (e.g., CLIP) for unsupervised anomaly detection. Our method employs natural language instructions to drive semantic-level feature recalibration, enabling fine-grained, interpretable, and user-preference-aligned detection. It supports prompt-based feature projection and is plug-and-play compatible with mainstream detectors (e.g., GAN- or reconstruction-based). Extensive experiments on multiple real-world benchmarks demonstrate an average 18.7% improvement in target anomaly recall while maintaining high accuracy on normal samples, validating both effectiveness and generalizability.

Technology Category

Application Category

📝 Abstract

This paper introduces LAFT, a novel feature transformation method designed to incorporate user knowledge and preferences into anomaly detection using natural language. Accurately modeling the boundary of normality is crucial for distinguishing abnormal data, but this is often challenging due to limited data or the presence of nuisance attributes. While unsupervised methods that rely solely on data without user guidance are common, they may fail to detect anomalies of specific interest. To address this limitation, we propose Language-Assisted Feature Transformation (LAFT), which leverages the shared image-text embedding space of vision-language models to transform visual features according to user-defined requirements. Combined with anomaly detection methods, LAFT effectively aligns visual features with user preferences, allowing anomalies of interest to be detected. Extensive experiments on both toy and real-world datasets validate the effectiveness of our method.

Problem

Research questions and friction points this paper is trying to address.

Incorporates user knowledge into anomaly detection using natural language.

Transforms visual features based on user-defined requirements for anomaly detection.

Improves detection of specific anomalies by aligning features with user preferences.

Innovation

Methods, ideas, or system contributions that make the work stand out.

LAFT integrates user knowledge via natural language

Uses vision-language models for feature transformation

Aligns visual features with user-defined anomaly preferences

🔎 Similar Papers

No similar papers found.