🤖 AI Summary
Medical large-scale images—such as high-resolution histopathological slides and surgical videos—pose significant challenges for end-to-end segmentation due to GPU memory constraints. Conventional models (e.g., U-Net, ViT) resort to patch- or frame-wise inference, compromising global consistency and inference efficiency. To address this, we propose OctreeNCA: the first neural cellular automaton (NCA) integrating an octree structure, enabling hierarchical neighborhood expansion for localized updates that capture multi-scale global context. We further design a custom CUDA inference kernel optimized for memory coalescing and fine-grained parallelism. Experiments demonstrate that OctreeNCA reduces GPU memory consumption by 90% compared to U-Net, enabling full-image segmentation of 184-megapixel pathology slides or 60-second surgical videos on a single consumer-grade GPU. Crucially, it preserves global structural coherence while achieving substantial speedups and improved deployment practicality.
📝 Abstract
Medical applications demand segmentation of large inputs, like prostate MRIs, pathology slices, or videos of surgery. These inputs should ideally be inferred at once to provide the model with proper spatial or temporal context. When segmenting large inputs, the VRAM consumption of the GPU becomes the bottleneck. Architectures like UNets or Vision Transformers scale very poorly in VRAM consumption, resulting in patch- or frame-wise approaches that compromise global consistency and inference speed. The lightweight Neural Cellular Automaton (NCA) is a bio-inspired model that is by construction size-invariant. However, due to its local-only communication rules, it lacks global knowledge. We propose OctreeNCA by generalizing the neighborhood definition using an octree data structure. Our generalized neighborhood definition enables the efficient traversal of global knowledge. Since deep learning frameworks are mainly developed for large multi-layer networks, their implementation does not fully leverage the advantages of NCAs. We implement an NCA inference function in CUDA that further reduces VRAM demands and increases inference speed. Our OctreeNCA segments high-resolution images and videos quickly while occupying 90% less VRAM than a UNet during evaluation. This allows us to segment 184 Megapixel pathology slices or 1-minute surgical videos at once.