Interpreting Multimodal Communication at Scale in Short-Form Video: Visual, Audio, and Textual Mental Health Discourse on TikTok

📅 2026-01-21

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing research often examines textual, visual, or audio modalities of short videos in isolation, failing to uncover how their interplay influences user engagement. This work proposes the first reproducible and interpretable multimodal analysis framework that integrates automated feature extraction with Shapley-value-based attribution to systematically investigate how multimodal interactions affect view counts in TikTok content related to social anxiety disorder. The study reveals that facial expressions are more predictive than textual sentiment, that informational content garners greater attention than emotional support, and that multimodal synergies exhibit strong threshold-dependent effects—thereby transcending the limitations of conventional unimodal analyses.

Technology Category

Application Category

📝 Abstract

Short-form video platforms integrate text, visuals, and audio into complex communicative acts, yet existing research analyzes these modalities in isolation, lacking scalable frameworks to interpret their joint contributions. This study introduces a pipeline combining automated multimodal feature extraction with Shapley value-based interpretability to analyze how text, visuals, and audio jointly influence engagement. Applying this framework to 162,965 TikTok videos and 814,825 images about social anxiety disorder (SAD), we find that facial expressions outperform textual sentiment in predicting viewership, informational content drives more attention than emotional support, and cross-modal synergies exhibit threshold-dependent effects. These findings demonstrate how multimodal analysis reveals interaction patterns invisible to single-modality approaches. Methodologically, we contribute a reproducible framework for interpretable multimodal research applicable across domains; substantively, we advance understanding of mental health communication in algorithmically mediated environments.

Problem

Research questions and friction points this paper is trying to address.

multimodal communication

short-form video

mental health discourse

TikTok

cross-modal interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal analysis

Shapley value interpretability

short-form video