🤖 AI Summary
This study addresses the limited generalizability of existing electrocardiogram (ECG) digitization methods on real-world clinical images, which hinders reliable processing of large archival datasets. The authors propose a plug-and-play quality assurance module that integrates vision-language models (VLMs) into any ECG digitization pipeline via a standardized interface, operating in a closed-loop feedback manner without requiring modifications to the underlying system. A novel “tool anchoring” mechanism is introduced to fuse the VLM’s semantic judgments with quantitative evidence from domain-specific signal analysis tools, substantially improving consistency and fidelity discrimination in quality assessment. The approach demonstrates cross-model and cross-system compatibility, consistently enhancing performance across multiple backends: achieving a 29.4% improvement in borderline leads, a 41.2% recovery rate for failed limb leads, doubling the number of usable leads, and attaining a 98.0% high-quality rate on 428 real hypertrophic cardiomyopathy (HCM) images.
📝 Abstract
ECG digitization could unlock billions of archived clinical records, yet existing methods collapse on real-world images despite strong benchmark numbers. We introduce \textbf{VLM-in-the-Loop}, a plug-in quality assurance module that wraps any digitization backend with closed-loop VLM feedback via a standardized interface, requiring no modification to the underlying digitizer. The core mechanism is \textbf{tool grounding}: anchoring VLM assessment in quantitative evidence from domain-specific signal analysis tools. In a controlled ablation on 200 records with paired ground truth, tool grounding raises verdict consistency from 71\% to 89\% and doubles fidelity separation ($Δ$PCC 0.03 $\rightarrow$ 0.08), with the effect replicating across three VLMs (Claude Opus~4, GPT-4o, Gemini~2.5 Pro), confirming a pattern-level rather than model-specific gain. Deployed across four backends, the module improves every one: 29.4\% of borderline leads improved on our pipeline; 41.2\% of failed limb leads recovered on ECG-Digitiser; valid leads per image doubled on Open-ECG-Digitizer (2.5 $\rightarrow$ 5.8). On 428 real clinical HCM images, the integrated system reaches 98.0\% Excellent quality. Both the plug-in architecture and tool-grounding mechanism are domain-parametric, suggesting broader applicability wherever quality criteria are objectively measurable.