OFF-CLIP: Improving Normal Detection Confidence in Radiology CLIP with Simple Off-Diagonal Term Auto-Adjustment

📅 2025-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In radiology, CLIP-based models exhibit high false-positive and false-negative rates in zero-shot normal image detection, primarily due to intra-class variability among normal samples and semantic misalignment caused by spurious “normal” descriptors embedded in abnormal radiology reports. To address this, we propose a dual-path optimization framework: (1) an off-diagonal adaptive loss that enforces tighter clustering of normal samples in the joint embedding space; and (2) a lightweight sentence-level text filtering module that removes misleading “normal” phrases from abnormal reports, thereby mitigating image–text semantic conflict. Our method introduces the first dynamic gradient modulation mechanism for off-diagonal terms in contrastive learning. It significantly improves robustness of normal detection without compromising abnormal classification performance. On VinDr-CXR, our approach achieves a +0.61 AUC gain over CARZero for normal classification, accompanied by improved localization accuracy—demonstrating enhanced capability for abnormal region identification.

Technology Category

Application Category

📝 Abstract
Contrastive Language-Image Pre-Training (CLIP) has enabled zero-shot classification in radiology, reducing reliance on manual annotations. However, conventional contrastive learning struggles with normal case detection due to its strict intra-sample alignment, which disrupts normal sample clustering and leads to high false positives (FPs) and false negatives (FNs). To address these issues, we propose OFF-CLIP, a contrastive learning refinement that improves normal detection by introducing an off-diagonal term loss to enhance normal sample clustering and applying sentence-level text filtering to mitigate FNs by removing misaligned normal statements from abnormal reports. OFF-CLIP can be applied to radiology CLIP models without requiring any architectural modifications. Experimental results show that OFF-CLIP significantly improves normal classification, achieving a 0.61 Area under the curve (AUC) increase on VinDr-CXR over CARZero, the state-of-the-art zero-shot classification baseline, while maintaining or improving abnormal classification performance. Additionally, OFF-CLIP enhances zero-shot grounding by improving pointing game accuracy, confirming better anomaly localization. These results demonstrate OFF-CLIP's effectiveness as a robust and efficient enhancement for medical vision-language models.
Problem

Research questions and friction points this paper is trying to address.

Improves normal case detection in radiology CLIP models
Reduces false positives and false negatives in classification
Enhances zero-shot grounding and anomaly localization accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces off-diagonal term loss for better clustering
Applies sentence-level text filtering to reduce false negatives
Enhances zero-shot grounding and anomaly localization accuracy