Clinically-Grounded Counterfactual Reasoning for Medical Video Diagnosis

📅 2026-05-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current medical video diagnostic approaches overly rely on visual appearance features, lack integration of clinical prior knowledge, and struggle with counterfactual reasoning. To address these limitations, this work proposes MedVCR, a novel framework that introduces clinical rule-guided counterfactual reasoning into medical video diagnosis for the first time. MedVCR leverages a diffusion model to generate counterfactual videos under specific pathological conditions, thereby learning pathology-disentangled and temporally consistent representations. It further incorporates a dual-path prediction mechanism operating at both video-level and frame-level to emulate clinical diagnostic reasoning. Evaluated on colposcopy (fully supervised) and colonoscopy (weakly supervised) tasks, MedVCR outperforms state-of-the-art methods by 2.6%–10.2%, with ablation studies confirming the contribution of each component.
📝 Abstract
Medical video diagnosis involves inferring clinical decisions from dynamic tissue responses throughout examination processes. Existing methods rely on an end-to-end learning paradigm that i) focuses on appearance rather than pathology, ii) lacks clinical priors, and iii) reasons solely from observations without counterfactual comparison. This work introduces MedVCR, a counterfactual reasoning framework that mimics clinical diagnostic thinking. MedVCR comprises three components: a Counterfactual Generator that synthesizes tissue evolution under specified pathological states via a diffusion-based manner; a Counterfactual Representation Learning module that encodes diagnostic knowledge through clinical rules (i.e., temporal consistency, pathological separability, and counterfactual alignment); and a Dual Diagnostic Prediction strategy that integrates video-level assessment with frame-level counterfactual analysis. MedVCR is evaluated under both fully supervised (e.g., colposcopy) and weakly supervised (e.g., colonoscopy) video diagnosis settings, yielding 2.6%-10.2% performance gains compared with leading baselines. Comprehensive ablation studies further validate the effectiveness of each component. The code will be released.
Problem

Research questions and friction points this paper is trying to address.

medical video diagnosis
counterfactual reasoning
clinical priors
pathology-aware
diagnostic reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

counterfactual reasoning
medical video diagnosis
diffusion-based generation
clinical priors
representation learning
🔎 Similar Papers
No similar papers found.
J
Jianzhe Gao
The State Key Lab of Brain-Machine Intelligence, Zhejiang University
Churan Wang
Churan Wang
Peking University
medical image analysiscomputational vision
W
Weiyi Zhang
Department of Gynecology and Obstetrics, 7th Medical Center of Chinese PLA General Hospital
J
Jianghua Li
Department of Gynecology and Obstetrics, 7th Medical Center of Chinese PLA General Hospital
L
Li-An Li
Department of Gynecology and Obstetrics, 7th Medical Center of Chinese PLA General Hospital
Wenguan Wang
Wenguan Wang
Zhejiang University
Neural-Symbolic AIEmbodied AIAutonomous CarsComputer VisionArtificial Intelligence
Yixin Zhu
Yixin Zhu
Assistant Professor, Peking University
Computer VisionVisual ReasoningHuman-Robot Teaming
Y
Yizhou Wang
School of Computer Science, Peking University; State Key Lab of General AI, Peking University; Nat’l Eng. Research Center of Visual Technology