MedGen: Unlocking Medical Video Generation by Scaling Granularly-annotated Medical Videos

📅 2025-07-08

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

Medical video generation faces dual challenges of scarce high-quality annotated data and insufficient clinical accuracy. To address these, we introduce MedVideoCap-55K—the first large-scale, fine-grained annotated medical video dataset comprising 55,000 clinically diverse video clips—and propose MedGen, an open-source generative model built upon it. MedGen integrates video-text contrastive pretraining, a multimodal encoder-decoder architecture, and explicit medical knowledge alignment, augmented by rigorous human expert validation to ensure both visual fidelity and clinical credibility. Extensive experiments demonstrate that MedGen outperforms existing open-source methods across multiple medical video generation benchmarks, achieving visual quality and diagnostic plausibility comparable to commercial systems. Both the MedVideoCap-55K dataset and the MedGen codebase are fully open-sourced to advance research and clinical deployment in medical video generation.

Technology Category

Application Category

📝 Abstract

Recent advances in video generation have shown remarkable progress in open-domain settings, yet medical video generation remains largely underexplored. Medical videos are critical for applications such as clinical training, education, and simulation, requiring not only high visual fidelity but also strict medical accuracy. However, current models often produce unrealistic or erroneous content when applied to medical prompts, largely due to the lack of large-scale, high-quality datasets tailored to the medical domain. To address this gap, we introduce MedVideoCap-55K, the first large-scale, diverse, and caption-rich dataset for medical video generation. It comprises over 55,000 curated clips spanning real-world medical scenarios, providing a strong foundation for training generalist medical video generation models. Built upon this dataset, we develop MedGen, which achieves leading performance among open-source models and rivals commercial systems across multiple benchmarks in both visual quality and medical accuracy. We hope our dataset and model can serve as a valuable resource and help catalyze further research in medical video generation. Our code and data is available at https://github.com/FreedomIntelligence/MedGen

Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale medical video datasets for generation

Current models produce inaccurate medical video content

Need for high visual fidelity and medical accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing MedVideoCap-55K dataset

Developing MedGen for medical videos

Achieving high visual and medical accuracy

🔎 Similar Papers

No similar papers found.