Advancing Automated Spatio-Semantic Analysis in Picture Description Using Language Models

πŸ“… 2025-09-30
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing automated evaluation methods for picture description in cognitive-linguistic disorders often neglect the visual narrative pathβ€”the spatiotemporal ordering of descriptive elements. To address this, we propose the first end-to-end BERT-based framework that jointly models content information unit (CIU) detection and sequential ordering. Our approach synergistically optimizes CIU identification and narrative path reconstruction using binary cross-entropy loss and pairwise ranking loss. Under five-fold cross-validation, the model achieves median precision and recall of 93% and 96% for CIU detection, respectively, with a sequence error rate of only 24%. It demonstrates strong Pearson correlation (r > 0.8) with human annotations across features, and inter-group discrimination performance matches that of expert raters. All models and code are publicly released. This work establishes a novel, interpretable, and high-accuracy automated paradigm for clinical language assessment.

Technology Category

Application Category

πŸ“ Abstract
Current methods for automated assessment of cognitive-linguistic impairment via picture description often neglect the visual narrative path - the sequence and locations of elements a speaker described in the picture. Analyses of spatio-semantic features capture this path using content information units (CIUs), but manual tagging or dictionary-based mapping is labor-intensive. This study proposes a BERT-based pipeline, fine tuned with binary cross-entropy and pairwise ranking loss, for automated CIU extraction and ordering from the Cookie Theft picture description. Evaluated by 5-fold cross-validation, it achieves 93% median precision, 96% median recall in CIU detection, and 24% sequence error rates. The proposed method extracts features that exhibit strong Pearson correlations with ground truth, surpassing the dictionary-based baseline in external validation. These features also perform comparably to those derived from manual annotations in evaluating group differences via ANCOVA. The pipeline is shown to effectively characterize visual narrative paths for cognitive impairment assessment, with the implementation and models open-sourced to public.
Problem

Research questions and friction points this paper is trying to address.

Automating spatio-semantic analysis of picture descriptions
Reducing labor-intensive manual tagging of content units
Improving cognitive impairment assessment through visual narrative paths
Innovation

Methods, ideas, or system contributions that make the work stand out.

BERT-based pipeline automates CIU extraction
Fine-tuned with cross-entropy and ranking loss
Achieves high precision and recall in detection
πŸ”Ž Similar Papers
No similar papers found.
S
Si-Ioi Ng
Arizona State University, USA
P
Pranav S. Ambadi
Arizona State University, USA
K
Kimberly D. Mueller
University of Wisconsin-Madison, USA
J
Julie Liss
Arizona State University, USA
Visar Berisha
Visar Berisha
Professor, College of Engineering and College of Health Solutions, Arizona State University
Speech and audio AIClinical speech analyticsMachine learningHealthcare AI