🤖 AI Summary
Existing continuous/discrete latent-space models for medical image segmentation struggle to capture long-range anatomical dependencies and intra-/inter-class relationships, leading to redundant associations, high false-negative rates, and poor generalization. To address this, we propose an optimal transport–based global relationship modeling framework operating on a discrete codebook, introducing a novel learnable reference alignment mechanism that enables dynamic discriminative representation learning without additional parameterized weight matrices—thereby overcoming the redundancy bottleneck inherent in self-attention aggregation. Our method integrates VQ-style discrete latent spaces, optimal transport theory, and a UNet backbone. Evaluated on multi-organ and cardiac segmentation benchmarks, it significantly outperforms state-of-the-art methods including SynergyNet, achieving marked improvements in both accuracy and generalization while maintaining computational efficiency suitable for clinical real-time analysis.
📝 Abstract
Continuous Latent Space (CLS) and Discrete Latent Space (DLS) models, like AttnUNet and VQUNet, have excelled in medical image segmentation. In contrast, Synergistic Continuous and Discrete Latent Space (CDLS) models show promise in handling fine and coarse-grained information. However, they struggle with modeling long-range dependencies. CLS or CDLS-based models, such as TransUNet or SynergyNet are adept at capturing long-range dependencies. Since they rely heavily on feature pooling or aggregation using self-attention, they may capture dependencies among redundant regions. This hinders comprehension of anatomical structure content, poses challenges in modeling intra-class and inter-class dependencies, increases false negatives and compromises generalization. Addressing these issues, we propose L2GNet, which learns global dependencies by relating discrete codes obtained from DLS using optimal transport and aligning codes on a trainable reference. L2GNet achieves discriminative on-the-fly representation learning without an additional weight matrix in self-attention models, making it computationally efficient for medical applications. Extensive experiments on multi-organ segmentation and cardiac datasets demonstrate L2GNet's superiority over state-of-the-art methods, including the CDLS method SynergyNet, offering an novel approach to enhance deep learning models' performance in medical image analysis.