HarmoniAD: Harmonizing Local Structures and Global Semantics for Anomaly Detection

📅 2026-01-01
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of detecting minute defects in industrial quality inspection, where imbalanced modeling of local structures and global semantics often leads to noise sensitivity or loss of fine details. To this end, we propose HarmoniAD, a frequency-domain-guided dual-branch framework that decouples CLIP features into high- and low-frequency components. The high-frequency branch employs a Fine-grained Structural Attention Module (FSAM) to enhance texture and edge details, while the low-frequency branch leverages a Global Structural Context Module (GSCM) to capture long-range dependencies. By jointly optimizing local fidelity and global semantic coherence, HarmoniAD achieves state-of-the-art performance on MVTec-AD, VisA, and BTAD benchmarks, demonstrating both high sensitivity to subtle anomalies and strong robustness.

Technology Category

Application Category

📝 Abstract
Anomaly detection is crucial in industrial product quality inspection. Failing to detect tiny defects often leads to serious consequences. Existing methods face a structure-semantics trade-off: structure-oriented models (such as frequency-based filters) are noise-sensitive, while semantics-oriented models (such as CLIP-based encoders) often miss fine details. To address this, we propose HarmoniAD, a frequency-guided dual-branch framework. Features are first extracted by the CLIP image encoder, then transformed into the frequency domain, and finally decoupled into high- and low-frequency paths for complementary modeling of structure and semantics. The high-frequency branch is equipped with a fine-grained structural attention module (FSAM) to enhance textures and edges for detecting small anomalies, while the low-frequency branch uses a global structural context module (GSCM) to capture long-range dependencies and preserve semantic consistency. Together, these branches balance fine detail and global semantics. HarmoniAD further adopts a multi-class joint training strategy, and experiments on MVTec-AD, VisA, and BTAD show state-of-the-art performance with both sensitivity and robustness.
Problem

Research questions and friction points this paper is trying to address.

anomaly detection
structure-semantics trade-off
fine-grained defects
industrial inspection
frequency domain
Innovation

Methods, ideas, or system contributions that make the work stand out.

frequency-guided dual-branch
structural attention
global semantics
anomaly detection
multi-class joint training
🔎 Similar Papers
No similar papers found.
N
Naiqi Zhang
Tianjin University of Science and Technology
C
Chuancheng Shi
The University of Sydney
J
Jingtong Dou
The University of Sydney
Wenhua Wu
Wenhua Wu
Shanghai Jiao Tong University
computer vision
Fei Shen
Fei Shen
National University of Singapore
Controllable GenerationMultimodal Safety
J
Jianhua Cao
Tianjin University of Science and Technology