🤖 AI Summary
This work addresses negative transfer in cross-center thyroid ultrasound analysis caused by asymmetric degradation of geometric and textural cues. To mitigate this, the authors propose MKGA, a lightweight decoder-side adapter, along with its residual variant ResMKGA. The method integrates multi-scale skip-connection features and employs a multi-receptive-field aggregation strategy combined with a semantic gating mechanism to suppress artifact interference, thereby enhancing robustness in both segmentation and malignancy risk assessment. The study further reveals, for the first time, the divergent utilization of geometric versus textural cues by Vision Transformers (ViTs) and CNNs in cross-center transfer, informing the design of a context-aware gating architecture. Evaluated on two thyroid ultrasound benchmarks using ResNet34 and MedSAM backbones, the approach significantly improves cross-center segmentation performance and notably boosts TI-RADS diagnostic accuracy within CNN-based frameworks.
📝 Abstract
Thyroid ultrasound (US) automation couples two competing requirements: global, geometry-driven reasoning for nodule delineation and local, texture-driven reasoning for malignancy risk assessment. Under cross-center domain shift, these cues degrade asymmetrically, yet most multi-task pipelines rely on a single shared backbone, often inducing negative transfer. In this paper, we characterize this interference across CNN (ResNet34) and medical ViT (MedSAM) backbones, and observe a consistent trend: ViTs transfer geometric priors that benefit segmentation, whereas CNNs more reliably preserve texture cues for malignancy discrimination under strong shift and artifacts. Motivated by this failure mode, we propose a lightweight family of decoder-side adapters, the Multi-Kernel Gated Adapter (MKGA) and a residual variant (ResMKGA), which refine multi-scale skip features using complementary receptive fields and apply semantic, context-conditioned gating to suppress artifact-prone content before fusion. Across two US benchmarks, the proposed adapters improve cross-center robustness: they strengthen out-of-domain segmentation and, in the CNN setting, yield clear gains in clinical TI-RADS diagnostic accuracy compared to standard multi-task baselines. Code and models will be released.