DualTime: A Dual-Adapter Multimodal Language Model for Time Series Representation

📅 2024-06-07

🏛️ arXiv.org

📈 Citations: 9

✨ Influential: 1

career value

210K/year

🤖 AI Summary

Existing medical multimodal models suffer from modality role imbalance—specifically, temporal sequence dominance and textual modality marginalization—hindering full exploitation of cross-modal complementarity. To address this, we propose the first text-temporal co-equal modeling framework tailored for clinical applications such as epilepsy diagnosis. Our approach features: (1) a dual-adapter architecture enabling parallel, primary-level modeling of both time-series and textual modalities; (2) a novel lightweight adapter-token injection mechanism that achieves modality-embedding alignment and efficient fine-tuning within a shared large language model; and (3) integrated modality-aligned representation learning with few-shot transfer strategies. Experiments demonstrate state-of-the-art performance across both supervised and unsupervised tasks, significantly enhancing cross-modal generalization and few-shot adaptability.

Technology Category

Application Category

📝 Abstract

The recent rapid development of language models (LMs) has attracted attention in the field of time series, including multimodal time series modeling. However, we note that current time series multimodal methods are biased, often assigning a primary role to one modality while the other assumes a secondary role. They overlook the mutual benefits and complementary of different modalities. For example, in seizure diagnosis, relying solely on textual clinical reports makes it difficult to pinpoint the area and type of the disease, while electroencephalograms (EEGs) alone cannot provide an accurate diagnosis without considering the symptoms. In this study, based on the complementary information mining of time series multimodal data, we propose DualTime, a Dual-adapter multimodal language model for Time series representation implementing temporal-primary and textual-primary modeling simultaneously. By injecting lightweight adaption tokens, the LM pipeline shared by dual adapters encourages embedding alignment and achieves efficient fine-tuning. Empirically, our method outperforms state-of-the-art models in both supervised and unsupervised settings, highlighting the complementary benefits of different modalities. In addition, we conduct few-shot label transfer experiments, which further verifies the transferability and expressiveness of our proposed DualTime.

Problem

Research questions and friction points this paper is trying to address.

Addresses bias in medical time series-text multimodal learning approaches

Proposes dual-adapter model for balanced textual-temporal multimodal learning

Enhances cross-modal interaction and task-specific information capture

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual adapters enable primary role switching

Lightweight tokens for high-level modality fusion

Shared pipeline reduces computational resources

🔎 Similar Papers

A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model