Benchmarking Video Foundation Models for Remote Parkinson's Disease Screening

📅 2026-02-13

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This study addresses the current lack of systematic evaluation of video foundation models for cross-task effectiveness in remote Parkinson’s disease screening. For the first time, it comprehensively assesses seven prominent video foundation models—including VideoPrism, V-JEPA, and ViViT—on a large-scale real-world clinical video dataset, employing a frozen-embedding paradigm with linear classification heads across multiple clinical tasks. The results demonstrate area under the curve (AUC) scores ranging from 76.4% to 85.3%, with specificity as high as 90.3%, yet sensitivity remains relatively low (43.2–57.3%). These findings reveal a strong dependency between task performance and model architecture, offering critical guidance for model selection and future optimization in remote neurological disease monitoring.

Technology Category

Application Category

📝 Abstract

Remote, video-based assessments offer a scalable pathway for Parkinson's disease (PD) screening. While traditional approaches rely on handcrafted features mimicking clinical scales, recent advances in video foundation models (VFMs) enable representation learning without task-specific customization. However, the comparative effectiveness of different VFM architectures across diverse clinical tasks remains poorly understood. We present a large-scale systematic study using a novel video dataset from 1,888 participants (727 with PD), comprising 32,847 videos across 16 standardized clinical tasks. We evaluate seven state-of-the-art VFMs -- including VideoPrism, V-JEPA, ViViT, and VideoMAE -- to determine their robustness in clinical screening. By evaluating frozen embeddings with a linear classification head, we demonstrate that task saliency is highly model-dependent: VideoPrism excels in capturing visual speech kinematics (no audio) and facial expressivity, while V-JEPA proves superior for upper-limb motor tasks. Notably, TimeSformer remains highly competitive for rhythmic tasks like finger tapping. Our experiments yield AUCs of 76.4-85.3% and accuracies of 71.5-80.6%. While high specificity (up to 90.3%) suggests strong potential for ruling out healthy individuals, the lower sensitivity (43.2-57.3%) highlights the need for task-aware calibration and integration of multiple tasks and modalities. Overall, this work establishes a rigorous baseline for VFM-based PD screening and provides a roadmap for selecting suitable tasks and architectures in remote neurological monitoring. Code and anonymized structured data are publicly available: https://anonymous.4open.science/r/parkinson\_video\_benchmarking-A2C5

Problem

Research questions and friction points this paper is trying to address.

Parkinson's disease

video foundation models

remote screening

clinical assessment

benchmarking

Innovation

Methods, ideas, or system contributions that make the work stand out.

Video Foundation Models

Parkinson's Disease Screening

Remote Assessment

Model Benchmarking

Clinical Video Analysis

🔎 Similar Papers

Accessible, At-Home Detection of Parkinson's Disease via Multi-task Video Analysis

2024-06-21AAAI Conference on Artificial IntelligenceCitations: 0