Exploring a Multimodal Fusion-based Deep Learning Network for Detecting Facial Palsy

📅 2024-05-26
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF

career value

200K/year
🤖 AI Summary
Current clinical assessment of facial palsy suffers from subjectivity and labor intensity. To address this, this paper proposes a novel multimodal deep learning framework that, for the first time, jointly models unstructured facial geometric line-segment images and structured high-order expression features. The architecture comprises a ResNet-based image branch and a feed-forward neural network branch for structured features, integrated via a learnable feature-level weighted fusion mechanism. In mild facial palsy classification, the fused model achieves 77.05% accuracy—surpassing the best unimodal baseline (76.22%). Notably, the line-segment image branch attains an 83.47% recall, significantly enhancing lesion sensitivity. This work demonstrates that complementary multimodal representation effectively captures subtle clinical manifestations of facial palsy, thereby establishing a new paradigm for objective, interpretable, and automated facial palsy assessment.

Technology Category

Application Category

📝 Abstract
Algorithmic detection of facial palsy offers the potential to improve current practices, which usually involve labor-intensive and subjective assessment by clinicians. In this paper, we present a multimodal fusion-based deep learning model that utilizes unstructured data (i.e. an image frame with facial line segments) and structured data (i.e. features of facial expressions) to detect facial palsy. We then contribute to a study to analyze the effect of different data modalities and the benefits of a multimodal fusion-based approach using videos of 21 facial palsy patients. Our experimental results show that among various data modalities (i.e. unstructured data - RGB images and images of facial line segments and structured data - coordinates of facial landmarks and features of facial expressions), the feed-forward neural network using features of facial expression achieved the highest precision of 76.22 while the ResNet-based model using images of facial line segments achieved the highest recall of 83.47. When we leveraged both images of facial line segments and features of facial expressions, our multimodal fusion-based deep learning model slightly improved the precision score to 77.05 at the expense of a decrease in the recall score.
Problem

Research questions and friction points this paper is trying to address.

Develops a deep learning model for facial palsy detection.
Combines unstructured and structured data for improved accuracy.
Analyzes effectiveness of multimodal fusion in clinical assessments.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal fusion combines structured and unstructured data.
Deep learning models detect facial palsy with high precision.
Facial expression features and line segments enhance detection.
🔎 Similar Papers
No similar papers found.