LoMix: Learnable Weighted Multi-Scale Logits Mixing for Medical Image Segmentation

📅 2025-10-27

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

In medical image segmentation, U-shaped networks commonly employ isolated supervision of multi-scale logits, neglecting the complementary relationship between coarse-grained contextual information and fine-grained details. To address this, we propose LoMix: a learnable, differentiable multi-scale logit fusion framework. Inspired by neural architecture search (NAS), LoMix employs a lightweight, modular fusion unit—supporting additive, multiplicative, concatenative, and attention-weighted combinations—to automatically discover the optimal fusion strategy. Furthermore, we introduce a softplus-weighted loss to jointly optimize both network parameters and fusion weights, enabling inference-free, fully interpretable, and data-efficient multi-scale collaborative modeling. Evaluated on the Synapse dataset, LoMix achieves a 4.2% Dice improvement over baselines; under few-shot settings, the gain reaches 9.23%; and across four benchmark datasets, it attains up to a 13.5% Dice increase—demonstrating strong generalizability and practical efficacy.

Technology Category

Application Category

📝 Abstract

U-shaped networks output logits at multiple spatial scales, each capturing a different blend of coarse context and fine detail. Yet, training still treats these logits in isolation - either supervising only the final, highest-resolution logits or applying deep supervision with identical loss weights at every scale - without exploring mixed-scale combinations. Consequently, the decoder output misses the complementary cues that arise only when coarse and fine predictions are fused. To address this issue, we introduce LoMix (Logits Mixing), a NAS-inspired, differentiable plug-and-play module that generates new mixed-scale outputs and learns how exactly each of them should guide the training process. More precisely, LoMix mixes the multi-scale decoder logits with four lightweight fusion operators: addition, multiplication, concatenation, and attention-based weighted fusion, yielding a rich set of synthetic mutant maps. Every original or mutant map is given a softplus loss weight that is co-optimized with network parameters, mimicking a one-step architecture search that automatically discovers the most useful scales, mixtures, and operators. Plugging LoMix into recent U-shaped architectures (i.e., PVT-V2-B2 backbone with EMCAD decoder) on Synapse 8-organ dataset improves DICE by +4.2% over single-output supervision, +2.2% over deep supervision, and +1.5% over equally weighted additive fusion, all with zero inference overhead. When training data are scarce (e.g., one or two labeled scans), the advantage grows to +9.23%, underscoring LoMix's data efficiency. Across four benchmarks and diverse U-shaped networks, LoMiX improves DICE by up to +13.5% over single-output supervision, confirming that learnable weighted mixed-scale fusion generalizes broadly while remaining data efficient, fully interpretable, and overhead-free at inference. Our code is available at https://github.com/SLDGroup/LoMix.

Problem

Research questions and friction points this paper is trying to address.

Optimizing multi-scale logit fusion in medical image segmentation networks

Learning adaptive loss weights for mixed-scale predictions during training

Improving segmentation accuracy without increasing inference computational costs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Learns weighted fusion of multi-scale logits

Uses four lightweight operators for mixing scales

Co-optimizes loss weights with network parameters

🔎 Similar Papers

No similar papers found.