Disentangling Shared and Task-Specific Representations from Multi-Modal Clinical Data

📅 2026-05-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

178K/year
🤖 AI Summary
This study addresses the challenge of negative transfer in multitask learning with multimodal clinical data, which hinders effective modeling of related yet heterogeneous clinical outcomes. To overcome this limitation, the authors propose a unified Transformer-based multitask framework incorporating an Orthogonal Task Decomposition (OrthTD) mechanism. This approach explicitly decouples shared and task-specific subspaces at the representation level and enforces geometric orthogonality constraints to suppress redundancy and isolate task-unique signals. Evaluated on data from 12,430 surgical patients for predicting four distinct clinical outcomes, the model achieves an average AUC of 87.5% and AUPRC of 37.2%, significantly outperforming existing methods—particularly excelling in the detection of rare events.
📝 Abstract
Real-world clinical data is inherently multimodal, providing complementary evidence that mirrors the practical necessity of jointly assessing multiple related outcomes. Although multi-task learning can improve efficiency by sharing information across outcomes, existing approaches often fail to balance shared representation learning with outcome-specific modeling. Hard parameter sharing can trigger negative transfer when task gradients conflict, while flexible sharing may still entangle shared and task-specific signals. To address this, we propose a multi-task framework built on a unified Transformer for multimodal fusion, augmented with Orthogonal Task Decomposition (OrthTD) to split patient representations into shared and task-specific subspaces and impose a geometric orthogonality constraint to reduce redundancy and isolate task-specific signals. We evaluated OrthTD on a real-world cohort of 12,430 surgical patients for predicting four outcomes. OrthTD achieved average AUC (area under the receiver operating characteristic curve) of 87.5% and average AUPRC (area under the precision-recall curve) of 37.2%, consistently outperformed advanced tabular and multi-task methods. Notably, OrthTD achieves substantial gains in AUPRC, indicating superior performance in identifying rare events within imbalanced clinical data. These results suggest that enforcing non-redundant shared and task-specific representations can improve multi-outcome prediction from multimodal clinical data.
Problem

Research questions and friction points this paper is trying to address.

multi-task learning
multimodal clinical data
representation disentanglement
shared representation
task-specific representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Orthogonal Task Decomposition
Multi-task Learning
Multimodal Fusion
Transformer
Representation Disentanglement
He Lyu
He Lyu
Meta
Compress sensingOptimization
H
Huolin Zeng
Department of Anesthesiology and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China,Med-X Center for Informatics, Sichuan University, Chengdu, China
J
Junren Wang
Department of Anesthesiology and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China,Med-X Center for Informatics, Sichuan University, Chengdu, China
H
Huazhen Yang
Department of Anesthesiology and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China,Med-X Center for Informatics, Sichuan University, Chengdu, China
L
Linchao He
Department of National Key Laboratory of Fundamental Science on Synthetic Vision, Sichuan University, Chengdu, China
Y
Yong Chen
Department of Anesthesiology and West China Biomedical Big Data Center, West China Hospital, Sichuan University, Chengdu, China,Med-X Center for Informatics, Sichuan University, Chengdu, China
Z
Zhirui Li
Sichuan University, Chengdu, China
A
Andreas Maier
Pattern Recognition Lab, Friedrich-Alexander-Universit ¨at Erlangen-N ¨urnberg, Erlangen, Germany
Siming Bayer
Siming Bayer
Researcher, Pattern Recognition Lab, Friedrich-Alexander University
Medical Image ProcessingComputer Guided InterventionMachine Learning
Huan Song
Huan Song
Amazon AWS AI
Deep learningmachine learninggraph neural networkstime-series analysis