EchoLVFM: One-Step Video Generation via Latent Flow Matching for Echocardiogram Synthesis

📅 2026-03-14

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Existing echocardiography video generation methods rely on multi-step sampling and strong temporal normalization, making them ill-suited for real-world heterogeneous data. This work proposes the first single-step latent flow matching framework that enables efficient and controllable video synthesis guided by global clinical variables—such as ejection fraction (EF)—while supporting both partial observation reconstruction and counterfactual generation. A novel mask-conditioning mechanism is introduced to overcome the limitation of fixed-length inputs. Evaluated on the CAMUS dataset, the method achieves approximately 50× faster sampling while preserving high visual fidelity and precise EF control; notably, expert evaluators achieved only 57.9% accuracy in distinguishing generated from real videos, approaching random guessing.

Technology Category

Application Category

📝 Abstract

Echocardiography is widely used for assessing cardiac function, where clinically meaningful parameters such as left-ventricular ejection fraction (EF) play a central role in diagnosis and management. Generative models capable of synthesising realistic echocardiogram videos with explicit control over such parameters are valuable for data augmentation, counterfactual analysis, and specialist training. However, existing approaches typically rely on computationally expensive multi-step sampling and aggressive temporal normalisation, limiting efficiency and applicability to heterogeneous real-world data. We introduce EchoLVFM, a one-step latent video flow-matching framework for controllable echocardiogram generation. Operating in the latent space, EchoLVFM synthesises temporally coherent videos in a single inference step, achieving a $\mathbf{\sim 50\times}$ improvement in sampling efficiency compared to multi-step flow baselines while maintaining visual fidelity. The model supports global conditioning on clinical variables, demonstrated through precise control of EF, and enables reconstruction and counterfactual generation from partially observed sequences. A masked conditioning strategy further removes fixed-length constraints, allowing shorter sequences to be retained rather than discarded. We evaluate EchoLVFM on the CAMUS dataset under challenging single-frame conditioning. Quantitative and qualitative results demonstrate competitive video quality, strong EF adherence, and 57.9% discrimination accuracy by expert clinicians which is close to chance. These findings indicate that efficient, one-step flow matching can enable practical, controllable echocardiogram video synthesis without sacrificing fidelity. Code available at: https://github.com/EngEmmanuel/EchoLVFM

Problem

Research questions and friction points this paper is trying to address.

echocardiogram synthesis

video generation

sampling efficiency

temporal coherence

controllable generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

latent flow matching

one-step video generation

controllable echocardiogram synthesis