🤖 AI Summary
This work proposes the first end-to-end patient activity recognition framework that integrates explicit, learnable logical rules to enable interpretable reasoning beyond conventional classification. By fusing multi-view contextual features and employing neural-guided differentiable logic rules, the model achieves symbolic mapping and provides auditable “why” explanations for its predictions, while also supporting counterfactual intervention analysis. Unlike existing approaches that focus solely on predictive accuracy, this method introduces symbolic reasoning into clinical activity recognition, offering both high performance and strong interpretability. Evaluated on the VAST and OmniFall clinical benchmarks, the framework significantly outperforms state-of-the-art vision-language models and Transformer-based baselines, demonstrating its effectiveness in delivering accurate yet explainable insights into patient behavior.
📝 Abstract
Patient Activity Recognition (PAR) in clinical settings uses activity data to improve safety and quality of care. Although significant progress has been made, current models mainly identify which activity is occurring. They often spatially compose sub-sparse visual cues using global and local attention mechanisms, yet only learn logically implicit patterns due to their neural-pipeline. Advancing clinical safety requires methods that can infer why a set of visual cues implies a risk, and how these can be compositionally reasoned through explicit logic beyond mere classification. To address this, we proposed Logi-PAR, the first Logic-Infused Patient Activity Recognition Framework that integrates contextual fact fusion as a multi-view primitive extractor and injects neural-guided differentiable rules. Our method automatically learns rules from visual cues, optimizing them end-to-end while enabling the implicit emergence patterns to be explicitly labelled during training. To the best of our knowledge, Logi-PAR is the first framework to recognize patient activity by applying learnable logic rules to symbolic mappings. It produces auditable why explanations as rule traces and supports counterfactual interventions (e.g., risk would decrease by 65% if assistance were present). Extensive evaluation on clinical benchmarks (VAST and OmniFall) demonstrates state-of-the-art performance, significantly outperforming Vision-Language Models and transformer baselines. The code is available via: https://github.com/zararkhan985/Logi-PAR.git}