Harmonized Feature Conditioning and Frequency-Prompt Personalization for Multi-Rater Medical Segmentation

📅 2026-05-06

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work addresses the challenges of model overconfidence and inadequate uncertainty calibration in multi-annotator medical image segmentation, which arise from inter-expert interpretation discrepancies and imaging artifacts. To this end, the authors propose a cohesive probabilistic framework that employs a lightweight Harmonizer Network to adaptively modulate features, effectively disentangling imaging artifacts from genuine annotation variability. Additionally, a high-frequency prompting module encodes annotator-specific boundary styles in the frequency domain, while generalized energy distance (GED) regularization enforces anatomical consistency and distributional alignment. Evaluated on the LIDC-IDRI and NPC-170 datasets, the method significantly reduces GED, improves Dice scores—particularly under noisy conditions—and yields clinically interpretable uncertainty estimates.

📝 Abstract

Multi-rater medical image segmentation captures the inherent ambiguity of clinical interpretation, where diagnostic boundaries vary across experts and imaging devices. Existing approaches often reduce this diversity to consensus labels or treat rater differences as noise, resulting in overconfident and poorly calibrated models. We propose a harmonized probabilistic framework that disentangles acquisition artifacts from genuine annotator variability through adaptive feature conditioning and frequency-domain personalization. A lightweight Harmonizer Network implicitly models scanner-specific artifacts and performs dynamic feature modulation to standardize latent representations, ensuring that uncertainty reflects anatomy rather than noise. To represent rater-specific styles, we introduce a novel High-Frequency Prompt Modules that operate in the spectral domain to encode annotator-dependent boundary precision and textural sensitivity. These prompts adaptively modulate harmonized features to produce personalized yet anatomically consistent segmentations. Furthermore, a Generalized Energy Distance based regularization aligns the generative distribution with empirical annotation variability, promoting diversity where experts disagree and consensus where they converge. Experiments on LIDC-IDRI and NPC-170 show SOTA aggregated and individualized segmentation, with notable GED reductions and improved Dice scores, especially on noisy cases. Beyond accuracy, the model exhibits clinically meaningful uncertainty. Confidence rises in agreement regions and declines in ambiguous areas, supporting its use as a reliable and interpretable tool for multi-expert clinical workflows.

Problem

Research questions and friction points this paper is trying to address.

multi-rater segmentation

annotation variability

medical image segmentation

uncertainty calibration

inter-rater disagreement

Innovation

Methods, ideas, or system contributions that make the work stand out.

feature conditioning

frequency-domain personalization

multi-rater segmentation