🤖 AI Summary
This work proposes the first unified multimodal large language model for financial risk modeling that jointly captures micro-, meso-, and macro-scale dependencies, addressing the limitation of existing approaches that treat these scales in isolation. By integrating financial text, time-series data, fundamental indicators, and visual information through a shared Transformer backbone, modular task-specific heads, and cross-modal attention mechanisms, the model enables end-to-end collaborative prediction across risk levels. Empirical results demonstrate significant improvements over strong baselines, achieving 67.4% accuracy (+5.7%) in stock direction prediction, 84.1% in credit risk assessment, and 82.3% in macroeconomic risk early warning. This study thus establishes the first framework for end-to-end joint modeling and optimization of cross-scale financial risks.
📝 Abstract
Financial institutions and regulators require systems that integrate heterogeneous data to assess risks from stock fluctuations to systemic vulnerabilities. Existing approaches often treat these tasks in isolation, failing to capture cross-scale dependencies. We propose Uni-FinLLM, a unified multimodal large language model that uses a shared Transformer backbone and modular task heads to jointly process financial text, numerical time series, fundamentals, and visual data. Through cross-modal attention and multi-task optimization, it learns a coherent representation for micro-, meso-, and macro-level predictions. Evaluated on stock forecasting, credit-risk assessment, and systemic-risk detection, Uni-FinLLM significantly outperforms baselines. It raises stock directional accuracy to 67.4% (from 61.7%), credit-risk accuracy to 84.1% (from 79.6%), and macro early-warning accuracy to 82.3%. Results validate that a unified multimodal LLM can jointly model asset behavior and systemic vulnerabilities, offering a scalable decision-support engine for finance.