🤖 AI Summary
Existing full-reference image quality assessment (FR-IQA) metrics—such as HaarPSI—are primarily optimized for natural images and exhibit suboptimal performance on medical imagery. Method: Leveraging radiologist-provided subjective scores on photoacoustic imaging and chest X-ray data, we propose HaarPSI<sub>MED</sub>, the first medically tailored parameter configuration of HaarPSI. Its two core parameters are optimized via grid search, and its cross-modal generalizability is validated on an independent CT dataset. Contribution/Results: HaarPSI<sub>MED</sub> significantly improves alignment with human perceptual judgments (p < 0.05), demonstrates parameter stability and robustness across datasets, and exhibits enhanced sensitivity to pathological details in qualitative analysis. This work bridges a critical gap in adapting HaarPSI for medical IQA and establishes a reproducible paradigm for domain-specific FR-IQA metric customization.
📝 Abstract
When developing machine learning models, image quality assessment (IQA) measures are a crucial component for the evaluation of obtained output images. However, commonly used full-reference IQA (FR-IQA) measures have been primarily developed and optimized for natural images. In many specialized settings, such as medical images, this poses an often overlooked problem regarding suitability. In previous studies, the FR-IQA measure HaarPSI showed promising behavior regarding generalizability. The measure is based on Haar wavelet representations and the framework allows optimization of two parameters. So far, these parameters have been aligned for natural images. Here, we optimize these parameters for two medical image data sets, a photoacoustic and a chest X-ray data set, with IQA expert ratings. We observe that they lead to similar parameter values, different to the natural image data, and are more sensitive to parameter changes. We denote the novel optimized setting as HaarPSI$_{MED}$, which improves the performance of the employed medical images significantly (p<0.05). Additionally, we include an independent CT test data set that illustrates the generalizability of HaarPSI$_{MED}$, as well as visual examples that qualitatively demonstrate the improvement. The results suggest that adapting common IQA measures within their frameworks for medical images can provide a valuable, generalizable addition to employment of more specific task-based measures.