🤖 AI Summary
Medical image analysis faces challenges including high annotation costs, poor generalizability, and difficulty in multi-modal adaptation. To address these, we propose VIS-MAE—a novel self-supervised framework that jointly pretrains dual decoders (segmentation and classification) guided by an anatomy-aware masking strategy. By coupling local structural reconstruction with global semantic modeling, VIS-MAE enhances the robustness of learned medical features; additionally, contrastive regularization is introduced to improve representation discriminability. Evaluated on benchmarks including BTCV and KiTS19, VIS-MAE achieves a 3.2% improvement in segmentation Dice score and a 2.8% gain in classification accuracy over prior methods. Remarkably, using only 10% labeled data, it matches the performance of fully supervised state-of-the-art models, substantially reducing reliance on costly annotations.