DualTime: A Dual-Adapter Multimodal Language Model for Time Series Representation

📅 2024-06-07
🏛️ arXiv.org
📈 Citations: 9
Influential: 1
📄 PDF
🤖 AI Summary
Existing medical multimodal models suffer from modality role imbalance—specifically, temporal sequence dominance and textual modality marginalization—hindering full exploitation of cross-modal complementarity. To address this, we propose the first text-temporal co-equal modeling framework tailored for clinical applications such as epilepsy diagnosis. Our approach features: (1) a dual-adapter architecture enabling parallel, primary-level modeling of both time-series and textual modalities; (2) a novel lightweight adapter-token injection mechanism that achieves modality-embedding alignment and efficient fine-tuning within a shared large language model; and (3) integrated modality-aligned representation learning with few-shot transfer strategies. Experiments demonstrate state-of-the-art performance across both supervised and unsupervised tasks, significantly enhancing cross-modal generalization and few-shot adaptability.

Technology Category

Application Category

📝 Abstract
The recent rapid development of language models (LMs) has attracted attention in the field of time series, including multimodal time series modeling. However, we note that current time series multimodal methods are biased, often assigning a primary role to one modality while the other assumes a secondary role. They overlook the mutual benefits and complementary of different modalities. For example, in seizure diagnosis, relying solely on textual clinical reports makes it difficult to pinpoint the area and type of the disease, while electroencephalograms (EEGs) alone cannot provide an accurate diagnosis without considering the symptoms. In this study, based on the complementary information mining of time series multimodal data, we propose DualTime, a Dual-adapter multimodal language model for Time series representation implementing temporal-primary and textual-primary modeling simultaneously. By injecting lightweight adaption tokens, the LM pipeline shared by dual adapters encourages embedding alignment and achieves efficient fine-tuning. Empirically, our method outperforms state-of-the-art models in both supervised and unsupervised settings, highlighting the complementary benefits of different modalities. In addition, we conduct few-shot label transfer experiments, which further verifies the transferability and expressiveness of our proposed DualTime.
Problem

Research questions and friction points this paper is trying to address.

Addresses bias in medical time series-text multimodal learning approaches
Proposes dual-adapter model for balanced textual-temporal multimodal learning
Enhances cross-modal interaction and task-specific information capture
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual adapters enable primary role switching
Lightweight tokens for high-level modality fusion
Shared pipeline reduces computational resources
🔎 Similar Papers
No similar papers found.
Weiqi Zhang
Weiqi Zhang
Tsinghua University
3D Computer VisionGenerative Model
J
Jiexia Ye
Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
Ziyue Li
Ziyue Li
CS PhD, University of Maryland
Machine learning
J
Jia Li
Hong Kong University of Science and Technology, Hong Kong SAR, China; Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China
F
F. Tsung
Hong Kong University of Science and Technology, Hong Kong SAR, China; Hong Kong University of Science and Technology (Guangzhou), Guangzhou, China