HAAF: Hierarchical Adaptation and Alignment of Foundation Models for Few-Shot Pathology Anomaly Detection

📅 2026-01-24

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the mismatch between semantic and visual granularity in fine-grained anomaly detection within histopathological images, particularly under few-shot settings where general-purpose vision-language models struggle to adapt to texture-rich local lesions. To this end, we propose the Hierarchical Adaptation and Alignment Framework (HAAF), which introduces a novel cross-scale alignment mechanism that leverages visual features to generate content-adaptive textual prompts while simultaneously guiding the visual encoder to focus on anomalous regions. HAAF further incorporates a dual-branch inference strategy that jointly exploits semantic and geometric information. Extensive experiments demonstrate that HAAF significantly outperforms existing methods across four histopathology benchmark datasets and efficiently adapts domain-specific backbones—such as CONCH—under low-resource conditions, achieving high accuracy and strong generalization in few-shot anomaly detection.

Technology Category

Application Category

📝 Abstract

Precision pathology relies on detecting fine-grained morphological abnormalities within specific Regions of Interest (ROIs), as these local, texture-rich cues - rather than global slide contexts - drive expert diagnostic reasoning. While Vision-Language (V-L) models promise data efficiency by leveraging semantic priors, adapting them faces a critical Granularity Mismatch, where generic representations fail to resolve such subtle defects. Current adaptation methods often treat modalities as independent streams, failing to ground semantic prompts in ROI-specific visual contexts. To bridge this gap, we propose the Hierarchical Adaptation and Alignment Framework (HAAF). At its core is a novel Cross-Level Scaled Alignment (CLSA) mechanism that enforces a sequential calibration order: visual features first inject context into text prompts to generate content-adaptive descriptors, which then spatially guide the visual encoder to spotlight anomalies. Additionally, a dual-branch inference strategy integrates semantic scores with geometric prototypes to ensure stability in few-shot settings. Experiments on four benchmarks show HAAF significantly outperforms state-of-the-art methods and effectively scales with domain-specific backbones (e.g., CONCH) in low-resource scenarios.

Problem

Research questions and friction points this paper is trying to address.

Few-Shot Pathology Anomaly Detection

Granularity Mismatch

Vision-Language Models

Region of Interest (ROI)

Semantic Alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Adaptation

Cross-Level Scaled Alignment

Vision-Language Models