Capsule Network-Based Multimodal Fusion for Mortgage Risk Assessment from Unstructured Data Sources

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Traditional mortgage risk assessment relies heavily on costly, non-public structured data, limiting scalability and accessibility in inclusive finance. Method: This paper proposes a multimodal credit evaluation framework leveraging free, publicly available unstructured data—namely text, images, and sentiment scores—through a novel FusionCapsNet capsule network. The architecture adaptively fuses features extracted by BERT (text), VGG (images), and an MLP (sentiment), preserving spatial hierarchies and semantic context. It further incorporates cross-modal attention and Grad-CAM-based visualization to enhance model interpretability and decision transparency. Contribution/Results: Extensive experiments demonstrate that the proposed method consistently outperforms unimodal baselines and mainstream fusion strategies—including addition, concatenation, and cross-attention—in AUC, partial AUC (pAUC), and F1-score. Critically, it significantly reduces dependence on proprietary data, offering a scalable, transparent, and privacy-conscious paradigm for credit risk assessment in inclusive financial systems.

Technology Category

Application Category

📝 Abstract

Mortgage risk assessment traditionally relies on structured financial data, which is often proprietary, confidential, and costly. In this study, we propose a novel multimodal deep learning framework that uses cost-free, publicly available, unstructured data sources, including textual information, images, and sentiment scores, to generate credit scores that approximate commercial scorecards. Our framework adopts a two-phase approach. In the unimodal phase, we identify the best-performing models for each modality, i.e. BERT for text, VGG for image data, and a multilayer perceptron for sentiment-based features. In the fusion phase, we introduce the capsule-based fusion network (FusionCapsNet), a novel fusion strategy inspired by capsule networks, but fundamentally redesigned for multimodal integration. Unlike standard capsule networks, our method adapts a specific mechanism in capsule networks to each modality and restructures the fusion process to preserve spatial, contextual, and modality-specific information. It also enables adaptive weighting so that stronger modalities dominate without ignoring complementary signals. Our framework incorporates sentiment analysis across distinct news categories to capture borrower and market dynamics and employs GradCAM-based visualizations as an interpretability tool. These components are designed features of the framework, while our results later demonstrate that they effectively enrich contextual understanding and highlight the influential factors driving mortgage risk predictions. Our results show that our multimodal FusionCapsNet framework not only exceeds individual unimodal models but also outperforms benchmark fusion strategies such as addition, concatenation, and cross attention in terms of AUC, partial AUC, and F1 score, demonstrating clear gains in both predictive accuracy and interpretability for mortgage risk assessment.

Problem

Research questions and friction points this paper is trying to address.

Assessing mortgage risk using free unstructured data instead of costly structured data

Developing capsule network-based fusion to integrate text, image and sentiment data

Improving predictive accuracy and interpretability for mortgage risk assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Capsule network fusion for multimodal data integration

BERT and VGG models process text and images

Adaptive weighting preserves modality-specific information

🔎 Similar Papers

No similar papers found.