🤖 AI Summary
Image memorability—the probability that an image is remembered after a single exposure—lacks a well-established computational model, and its key visual determinants remain unclear.
Method: We propose a single-epoch VGG16-based autoencoder trained to simulate human instantaneous memory; for the first time, we directly model reconstruction error as a memorability predictor. We further employ Integrated Gradients for interpretability, identifying semantically salient regions and high-contrast textures as low-level memorability cues.
Contribution/Results: We discover that inter-class discriminability of latent representations is a stronger predictor of memorability than conventional features (significantly positive correlation). Reconstruction error exhibits a significant negative correlation with human memorability scores (p < 0.001), and an MLP regressor achieves a Spearman correlation of 0.72. This work establishes a novel, interpretable computational paradigm for studying the neural and perceptual foundations of visual memory.
📝 Abstract
Image memorability refers to the phenomenon where certain images are more likely to be remembered than others. It is a quantifiable and intrinsic image attribute, defined as the likelihood of an image being remembered upon a single exposure. Despite advances in understanding human visual perception and memory, it is unclear what features contribute to an image's memorability. To address this question, we propose a deep learning-based computational modeling approach. We employ an autoencoder-based approach built on VGG16 convolutional neural networks (CNNs) to learn latent representations of images. The model is trained in a single-epoch setting, mirroring human memory experiments that assess recall after a single exposure. We examine the relationship between autoencoder reconstruction error and memorability, analyze the distinctiveness of latent space representations, and develop a multi-layer perceptron (MLP) model for memorability prediction. Additionally, we perform interpretability analysis using Integrated Gradients (IG) to identify the key visual characteristics that contribute to memorability. Our results demonstrate a significant correlation between the images' memorability score and the autoencoder's reconstruction error, as well as the robust predictive performance of its latent representations. Distinctiveness in these representations correlated significantly with memorability. Additionally, certain visual characteristics were identified as features contributing to image memorability in our model. These findings suggest that autoencoder-based representations capture fundamental aspects of image memorability, providing new insights into the computational modeling of human visual memory.