🤖 AI Summary
Medical imaging datasets exhibit substantial resolution variability, severely impairing model generalizability; conventional resampling methods either incur information loss or impose prohibitive computational overhead. To address this, we propose the Resolution-Invariant Autoencoder (RI-AE), which replaces fixed-factor-2 up/down-sampling with a learnable, dynamic spatial scaling layer—enabling end-to-end, layer-adaptive resolution alignment and ensuring constant latent-space resolution. Our key innovations are: (1) a parameterized, differentiable resampling module that jointly optimizes geometric transformation and feature reconstruction, and (2) an uncertainty-aware multi-task training framework that jointly optimizes super-resolution, classification, and generative objectives. Evaluated across diverse tasks, RI-AE achieves robust cross-resolution performance: average PSNR improvement of 1.8 dB, 76% reduction in classification accuracy variance, and marginal computational overhead (<5% increase in FLOPs). To our knowledge, RI-AE is the first framework to establish generic, resolution-invariant representation learning for medical imaging.
📝 Abstract
Deep learning has significantly advanced medical imaging analysis, yet variations in image resolution remain an overlooked challenge. Most methods address this by resampling images, leading to either information loss or computational inefficiencies. While solutions exist for specific tasks, no unified approach has been proposed. We introduce a resolution-invariant autoencoder that adapts spatial resizing at each layer in the network via a learned variable resizing process, replacing fixed spatial down/upsampling at the traditional factor of 2. This ensures a consistent latent space resolution, regardless of input or output resolution. Our model enables various downstream tasks to be performed on an image latent whilst maintaining performance across different resolutions, overcoming the shortfalls of traditional methods. We demonstrate its effectiveness in uncertainty-aware super-resolution, classification, and generative modelling tasks and show how our method outperforms conventional baselines with minimal performance loss across resolutions.