🤖 AI Summary
In medical imaging multi-biomarker detection, existing approaches suffer from the “seesaw effect”—where optimizing one biomarker degrades others—and excessive memory/computation overhead due to redundant single-task models. This paper proposes a collaborative framework integrating dedicated single-landmark models with lightweight adapters, achieving the first explicit decoupling of landmark similarity and subject-specific variability. High-accuracy single-landmark models are trained via dynamic pseudo-labeling and online template updating; adapters with shared-and-private weight structures enable parameter-efficient adaptation across landmarks. On multiple public benchmarks, the single-landmark models consistently surpass state-of-the-art (SOTA) joint-detection methods. The fused model slightly underperforms the ensemble of single-landmark models but still outperforms SOTA, while reducing GPU memory consumption by 62% and accelerating inference by 2.3×.
📝 Abstract
Landmark detection plays a crucial role in medical imaging applications such as disease diagnosis, bone age estimation, and therapy planning. However, training models for detecting multiple landmarks simultaneously often encounters the"seesaw phenomenon", where improvements in detecting certain landmarks lead to declines in detecting others. Yet, training a separate model for each landmark increases memory usage and computational overhead. To address these challenges, we propose a novel approach based on the belief that"landmarks are distinct"by training models with pseudo-labels and template data updated continuously during the training process, where each model is dedicated to detecting a single landmark to achieve high accuracy. Furthermore, grounded on the belief that"landmarks are also alike", we introduce an adapter-based fusion model, combining shared weights with landmark-specific weights, to efficiently share model parameters while allowing flexible adaptation to individual landmarks. This approach not only significantly reduces memory and computational resource requirements but also effectively mitigates the seesaw phenomenon in multi-landmark training. Experimental results on publicly available medical image datasets demonstrate that the single-landmark models significantly outperform traditional multi-point joint training models in detecting individual landmarks. Although our adapter-based fusion model shows slightly lower performance compared to the combined results of all single-landmark models, it still surpasses the current state-of-the-art methods while achieving a notable improvement in resource efficiency.