SpiderNets: Estimating Fear Ratings of Spider-Related Images with Vision Models

📅 2025-09-05

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This study investigates whether pretrained vision models can accurately predict human fear responses to spider images—supporting emotion-adaptive computerized exposure therapy. We evaluated ResNet, ViT, and ConvNeXt using transfer learning and five-fold cross-validation to regress fear scores (0–100) on a dataset of 313 annotated spider images. To enhance interpretability, we incorporated attention visualization and class-level error analysis, revealing that models attend to semantically meaningful features such as spider morphology and posture. The best-performing model achieved a mean absolute error of 10.1; learning curves indicate performance saturation at the current data scale. Our key contributions are threefold: (1) the first systematic validation of general-purpose vision models for fine-grained fear response regression; (2) an interpretability-driven framework for emotion-aware image assessment; and (3) empirical evidence that both dataset scale and feature interpretability are critically interdependent for clinical AI reliability.

Technology Category

Application Category

📝 Abstract

Advances in computer vision have opened new avenues for clinical applications, particularly in computerized exposure therapy where visual stimuli can be dynamically adjusted based on patient responses. As a critical step toward such adaptive systems, we investigated whether pretrained computer vision models can accurately predict fear levels from spider-related images. We adapted three diverse models using transfer learning to predict human fear ratings (on a 0-100 scale) from a standardized dataset of 313 images. The models were evaluated using cross-validation, achieving an average mean absolute error (MAE) between 10.1 and 11.0. Our learning curve analysis revealed that reducing the dataset size significantly harmed performance, though further increases yielded no substantial gains. Explainability assessments showed the models' predictions were based on spider-related features. A category-wise error analysis further identified visual conditions associated with higher errors (e.g., distant views and artificial/painted spiders). These findings demonstrate the potential of explainable computer vision models in predicting fear ratings, highlighting the importance of both model explainability and a sufficient dataset size for developing effective emotion-aware therapeutic technologies.

Problem

Research questions and friction points this paper is trying to address.

Predicting human fear ratings from spider images

Evaluating computer vision models for clinical exposure therapy

Assessing model explainability and dataset size impact

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer learning adapted pretrained vision models

Models predicted fear ratings from spider images

Explainability assessments identified spider-related features

🔎 Similar Papers

Exploring the Limits of Zero Shot Vision Language Models for Hate Meme Detection: The Vulnerabilities and their Interpretations