🤖 AI Summary
Current medical video diagnostic approaches overly rely on visual appearance features, lack integration of clinical prior knowledge, and struggle with counterfactual reasoning. To address these limitations, this work proposes MedVCR, a novel framework that introduces clinical rule-guided counterfactual reasoning into medical video diagnosis for the first time. MedVCR leverages a diffusion model to generate counterfactual videos under specific pathological conditions, thereby learning pathology-disentangled and temporally consistent representations. It further incorporates a dual-path prediction mechanism operating at both video-level and frame-level to emulate clinical diagnostic reasoning. Evaluated on colposcopy (fully supervised) and colonoscopy (weakly supervised) tasks, MedVCR outperforms state-of-the-art methods by 2.6%–10.2%, with ablation studies confirming the contribution of each component.
📝 Abstract
Medical video diagnosis involves inferring clinical decisions from dynamic tissue responses throughout examination processes. Existing methods rely on an end-to-end learning paradigm that i) focuses on appearance rather than pathology, ii) lacks clinical priors, and iii) reasons solely from observations without counterfactual comparison. This work introduces MedVCR, a counterfactual reasoning framework that mimics clinical diagnostic thinking. MedVCR comprises three components: a Counterfactual Generator that synthesizes tissue evolution under specified pathological states via a diffusion-based manner; a Counterfactual Representation Learning module that encodes diagnostic knowledge through clinical rules (i.e., temporal consistency, pathological separability, and counterfactual alignment); and a Dual Diagnostic Prediction strategy that integrates video-level assessment with frame-level counterfactual analysis. MedVCR is evaluated under both fully supervised (e.g., colposcopy) and weakly supervised (e.g., colonoscopy) video diagnosis settings, yielding 2.6%-10.2% performance gains compared with leading baselines. Comprehensive ablation studies further validate the effectiveness of each component. The code will be released.