🤖 AI Summary
This study addresses the limited performance of long-tailed disease classification in chest X-rays. We propose a novel method that integrates radiologists’ eye-tracking–guided temporal attention patterns—introducing, for the first time, dynamic, sequential visual search attention into long-tailed learning frameworks. Unlike static attention modeling, our approach features an integrate-and-dissociate dual-path architecture: one path captures global contextual information, while the other models the temporal evolution of localized pathological regions. The method synergistically combines eye-tracking–driven attention guidance, deep neural networks, and a customized long-tailed loss function. Evaluated on NIH-CXR-LT and MIMIC-CXR-LT benchmarks, it achieves average accuracy improvements of 4.1% over the best-performing long-tailed loss baseline and 21.7% over mainstream attention-based methods. Notably, it significantly enhances recognition of rare diseases and incidental, transient lesions—key challenges in clinical radiology.
📝 Abstract
In this work, we present GazeLT, a human visual attention integration-disintegration approach for long-tailed disease classification. A radiologist's eye gaze has distinct patterns that capture both fine-grained and coarser level disease related information. While interpreting an image, a radiologist's attention varies throughout the duration; it is critical to incorporate this into a deep learning framework to improve automated image interpretation. Another important aspect of visual attention is that apart from looking at major/obvious disease patterns, experts also look at minor/incidental findings (few of these constituting long-tailed classes) during the course of image interpretation. GazeLT harnesses the temporal aspect of the visual search process, via an integration and disintegration mechanism, to improve long-tailed disease classification. We show the efficacy of GazeLT on two publicly available datasets for long-tailed disease classification, namely the NIH-CXR-LT (n=89237) and the MIMIC-CXR-LT (n=111898) datasets. GazeLT outperforms the best long-tailed loss by 4.1% and the visual attention-based baseline by 21.7% in average accuracy metrics for these datasets. Our code is available at https://github.com/lordmoinak1/gazelt.