Disentangling Shared and Task-Specific Representations from Multi-Modal Clinical Data

📅 2026-05-05

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This study addresses the challenge of negative transfer in multitask learning with multimodal clinical data, which hinders effective modeling of related yet heterogeneous clinical outcomes. To overcome this limitation, the authors propose a unified Transformer-based multitask framework incorporating an Orthogonal Task Decomposition (OrthTD) mechanism. This approach explicitly decouples shared and task-specific subspaces at the representation level and enforces geometric orthogonality constraints to suppress redundancy and isolate task-unique signals. Evaluated on data from 12,430 surgical patients for predicting four distinct clinical outcomes, the model achieves an average AUC of 87.5% and AUPRC of 37.2%, significantly outperforming existing methods—particularly excelling in the detection of rare events.

📝 Abstract

Real-world clinical data is inherently multimodal, providing complementary evidence that mirrors the practical necessity of jointly assessing multiple related outcomes. Although multi-task learning can improve efficiency by sharing information across outcomes, existing approaches often fail to balance shared representation learning with outcome-specific modeling. Hard parameter sharing can trigger negative transfer when task gradients conflict, while flexible sharing may still entangle shared and task-specific signals. To address this, we propose a multi-task framework built on a unified Transformer for multimodal fusion, augmented with Orthogonal Task Decomposition (OrthTD) to split patient representations into shared and task-specific subspaces and impose a geometric orthogonality constraint to reduce redundancy and isolate task-specific signals. We evaluated OrthTD on a real-world cohort of 12,430 surgical patients for predicting four outcomes. OrthTD achieved average AUC (area under the receiver operating characteristic curve) of 87.5% and average AUPRC (area under the precision-recall curve) of 37.2%, consistently outperformed advanced tabular and multi-task methods. Notably, OrthTD achieves substantial gains in AUPRC, indicating superior performance in identifying rare events within imbalanced clinical data. These results suggest that enforcing non-redundant shared and task-specific representations can improve multi-outcome prediction from multimodal clinical data.

Problem

Research questions and friction points this paper is trying to address.

multi-task learning

multimodal clinical data

representation disentanglement

shared representation

task-specific representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal Task Decomposition

Multi-task Learning

Multimodal Fusion