🤖 AI Summary
This study addresses the limitations of manual, low-consistency, and non-scalable depression assessment in the PPAT (Person-Picking-an-Apple-from-a-Tree) drawing projection test. We propose VS-LLM, a vision–semantics joint analysis framework, the first to enable automated depression assessment using multimodal large language models. VS-LLM integrates computer vision with large language models to perform fine-grained semantic parsing of visual attributes—including color, composition, and spatial layout—yielding interpretable psychological representations. Experimental results demonstrate a 17.6% improvement in depression classification accuracy over expert psychologist assessments, significantly enhancing inter-rater consistency and cross-subject generalizability. Concurrently, we release the first publicly available PPAT depression-annotated dataset and full implementation code. This work establishes a novel paradigm for objective, scalable psychological evaluation in art therapy.
📝 Abstract
The Drawing Projection Test (DPT) is an essential tool in art therapy, allowing psychologists to assess participants' mental states through their sketches. Specifically, through sketches with the theme of "a person picking an apple from a tree (PPAT)", it can be revealed whether the participants are in mental states such as depression. Compared with scales, the DPT can enrich psychologists' understanding of an individual's mental state. However, the interpretation of the PPAT is laborious and depends on the experience of the psychologists. To address this issue, we propose an effective identification method to support psychologists in conducting a large-scale automatic DPT. Unlike traditional sketch recognition, DPT more focus on the overall evaluation of the sketches, such as color usage and space utilization. Moreover, PPAT imposes a time limit and prohibits verbal reminders, resulting in low drawing accuracy and a lack of detailed depiction. To address these challenges, we propose the following efforts: (1) Providing an experimental environment for automated analysis of PPAT sketches for depression assessment; (2) Offering a Visual-Semantic depression assessment based on LLM (VS-LLM) method; (3) Experimental results demonstrate that our method improves by 17.6% compared to the psychologist assessment method. We anticipate that this work will contribute to the research in mental state assessment based on PPAT sketches' elements recognition. Our datasets and codes are available at https://github.com/wmeiqi/VS-LLM.