Multivariate Gaussian Representation Learning for Medical Action Evaluation

📅 2025-11-13

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Medical fine-grained action assessment faces three key challenges: data scarcity, stringent accuracy requirements, and insufficient spatiotemporal modeling for rapid actions. To address these, we introduce CPREval-6k—the first multi-view, multi-label benchmark specifically designed for cardiopulmonary resuscitation (CPR) evaluation. We further propose GaussMedAct, a novel framework that models joint motion as time-varying anisotropic multivariate Gaussian distributions to preserve motion semantics; integrates a dual-stream spatial encoder combining Cartesian coordinates and displacement vectors to enhance noise robustness; and jointly leverages skeletal joint positions and bone-structure features for fine-grained representation learning. On CPREval-6k, GaussMedAct achieves 92.1% Top-1 accuracy—outperforming ST-GCN by 5.9%—while requiring only 10% of its FLOPs, enabling real-time inference. Moreover, it demonstrates significantly improved cross-dataset generalization.

Technology Category

Application Category

📝 Abstract

Fine-grained action evaluation in medical vision faces unique challenges due to the unavailability of comprehensive datasets, stringent precision requirements, and insufficient spatiotemporal dynamic modeling of very rapid actions. To support development and evaluation, we introduce CPREval-6k, a multi-view, multi-label medical action benchmark containing 6,372 expert-annotated videos with 22 clinical labels. Using this dataset, we present GaussMedAct, a multivariate Gaussian encoding framework, to advance medical motion analysis through adaptive spatiotemporal representation learning. Multivariate Gaussian Representation projects the joint motions to a temporally scaled multi-dimensional space, and decomposes actions into adaptive 3D Gaussians that serve as tokens. These tokens preserve motion semantics through anisotropic covariance modeling while maintaining robustness to spatiotemporal noise. Hybrid Spatial Encoding, employing a Cartesian and Vector dual-stream strategy, effectively utilizes skeletal information in the form of joint and bone features. The proposed method achieves 92.1% Top-1 accuracy with real-time inference on the benchmark, outperforming the ST-GCN baseline by +5.9% accuracy with only 10% FLOPs. Cross-dataset experiments confirm the superiority of our method in robustness.

Problem

Research questions and friction points this paper is trying to address.

Evaluating fine-grained medical actions with limited datasets

Modeling rapid spatiotemporal dynamics in clinical procedures

Achieving precise motion analysis under stringent medical requirements

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multivariate Gaussian encoding for medical motion analysis

Hybrid spatial encoding with Cartesian and Vector streams

Adaptive spatiotemporal representation learning with Gaussian tokens

🔎 Similar Papers

No similar papers found.