🤖 AI Summary
This study addresses the challenge of accurately predicting early human fixation regions in visual search tasks with unknown target locations, aiming to model bottom-up visual attention allocation. To this end, it proposes two multi-feature fusion pipelines that systematically integrate structure-oriented Gabor filter responses with statistical texture features derived from the Gray-Level Co-occurrence Matrix (GLCM)—a novel combination in this context. The approach is validated on digital breast tomosynthesis images, demonstrating that the generated salient regions exhibit strong alignment with human observers’ early eye movements and outperform conventional threshold-based models. These findings highlight the complementary roles of Gabor and GLCM features in visual information encoding and offer a new pathway for developing perception-driven observer models.
📝 Abstract
Understanding human visual search behavior is a fundamental problem in vision science and computer vision, with direct implications for modeling how observers allocate attention in location-unknown search tasks. In this study, we investigate the relationship between Gabor-based features and gray-level co-occurrence matrix (GLCM)–based texture features in modeling early-stage visual search behavior. Two feature-combination pipelines are proposed to integrate Gabor and GLCM features for narrowing the region of possible human fixations. The pipelines are evaluated using simulated digital breast tomosynthesis images. Results show qualitative agreement among fixation candidates predicted by the proposed pipelines and a threshold-based model observer. A strong correlation (r=0.765) is observed between GLCM mean and Gabor feature responses, indicating that these features encode related image information despite their different formulations. Eye-tracking data from human observers further suggest consistency between predicted fixation regions and early-stage gaze behavior. These findings highlight the value of combining structural and texture-based features for modeling visual search and support the development of perceptually informed observer models.