ArtCognition: A Multimodal AI Framework for Affective State Sensing from Visual and Kinematic Drawing Cues

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of objectively assessing human emotions and psychological states through nonverbal channels such as drawing. To this end, it proposes ArtCognition, a novel framework that, for the first time, integrates visual features of finished drawings with dynamic behavioral cues from the drawing process—such as stroke velocity and pauses—and leverages psychological knowledge to automate affective analysis of the House-Tree-Person test. The framework innovatively combines multimodal feature fusion with a retrieval-augmented generation (RAG) architecture to enhance the interpretability and reliability of psychological interpretations. Experimental results demonstrate that the extracted multimodal features exhibit significant correlations with standardized psychological metrics, underscoring the framework’s potential as a supportive tool in clinical psychological assessment.

Technology Category

Application Category

📝 Abstract
The objective assessment of human affective and psychological states presents a significant challenge, particularly through non-verbal channels. This paper introduces digital drawing as a rich and underexplored modality for affective sensing. We present a novel multimodal framework, named ArtCognition, for the automated analysis of the House-Tree-Person (HTP) test, a widely used psychological instrument. ArtCognition uniquely fuses two distinct data streams: static visual features from the final artwork, captured by computer vision models, and dynamic behavioral kinematic cues derived from the drawing process itself, such as stroke speed, pauses, and smoothness. To bridge the gap between low-level features and high-level psychological interpretation, we employ a Retrieval-Augmented Generation (RAG) architecture. This grounds the analysis in established psychological knowledge, enhancing explainability and reducing the potential for model hallucination. Our results demonstrate that the fusion of visual and behavioral kinematic cues provides a more nuanced assessment than either modality alone. We show significant correlations between the extracted multimodal features and standardized psychological metrics, validating the framework's potential as a scalable tool to support clinicians. This work contributes a new methodology for non-intrusive affective state assessment and opens new avenues for technology-assisted mental healthcare.
Problem

Research questions and friction points this paper is trying to address.

affective state sensing
digital drawing
psychological assessment
non-verbal behavior
multimodal analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal fusion
kinematic cues
Retrieval-Augmented Generation (RAG)
affective sensing
digital drawing
🔎 Similar Papers
No similar papers found.
B
Behrad Binaei-Haghighi
Department of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
N
Nafiseh Sadat Sajadi
Tehran Institute for Advanced Studies, Khatam University, Tehran, Iran
M
Mehrad Liviyan
Department of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
R
Reyhane Akhavan Kharazi
Department of Electrical and Computer Engineering, University of Tehran, Tehran, Iran
F
Fatemeh Amirkhani
Department of Psychology, Allameh Tabataba’i University, Tehran, Iran
Behnam Bahrak
Behnam Bahrak
Tehran Institute for Advanced Studies