🤖 AI Summary
U-Net’s skip connections enhance detail preservation but incur substantial GPU memory overhead, hindering deployment on resource-constrained edge devices. To jointly optimize memory efficiency and representational capacity in lightweight vision models, this work introduces a novel architectural paradigm integrating self-supervised learning, sparse skip-connection reparameterization, and multimodal feature alignment. The framework spans diverse vision tasks—including image understanding, neural radiance field (NeRF)-based 3D reconstruction, cross-domain generalization, and embodied perception. Systematically consolidating over 30 peer-reviewed papers from ACCV 2024 Workshop Session 7, the proposed methods establish new state-of-the-art performance for lightweight models on benchmarks such as Cityscapes and ScanNet, while enabling real-time inference on edge hardware.