🤖 AI Summary
Medical time-series (MedTS) classification suffers from poor generalization across individuals due to inter-subject heterogeneity, while existing methods are constrained by modality-specific inductive biases and struggle to learn domain-invariant representations. To address this, we propose a plug-and-play dual-contrastive learning framework: first, one-dimensional physiological signals are transformed into two-dimensional pseudo-images to enable vision-guided representation learning; second, pre-trained vision models extract generic features, which are jointly optimized via intra-modal consistency and cross-modal alignment to synergistically model temporal dynamics and visual semantics. This is the first vision-guided paradigm explicitly designed for MedTS classification, effectively mitigating subject-specific bias. Evaluated on six benchmark datasets under both subject-dependent and subject-independent settings, our method outperforms 14 state-of-the-art approaches, achieving significant improvements in classification accuracy and cross-subject generalization.
📝 Abstract
Medical time series (MedTS) classification is pivotal for intelligent healthcare, yet its efficacy is severely limited by poor cross-subject generation due to the profound cross-individual heterogeneity. Despite advances in architectural innovations and transfer learning techniques, current methods remain constrained by modality-specific inductive biases that limit their ability to learn universally invariant representations. To overcome this, we propose TS-P$^2$CL, a novel plug-and-play framework that leverages the universal pattern recognition capabilities of pre-trained vision models. We introduce a vision-guided paradigm that transforms 1D physiological signals into 2D pseudo-images, establishing a bridge to the visual domain. This transformation enables implicit access to rich semantic priors learned from natural images. Within this unified space, we employ a dual-contrastive learning strategy: intra-modal consistency enforces temporal coherence, while cross-modal alignment aligns time-series dynamics with visual semantics, thereby mitigating individual-specific biases and learning robust, domain-invariant features. Extensive experiments on six MedTS datasets demonstrate that TS-P$^2$CL consistently outperforms fourteen methods in both subject-dependent and subject-independent settings.