OctreeNCA: Single-Pass 184 MP Segmentation on Consumer Hardware

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Medical large-scale images—such as high-resolution histopathological slides and surgical videos—pose significant challenges for end-to-end segmentation due to GPU memory constraints. Conventional models (e.g., U-Net, ViT) resort to patch- or frame-wise inference, compromising global consistency and inference efficiency. To address this, we propose OctreeNCA: the first neural cellular automaton (NCA) integrating an octree structure, enabling hierarchical neighborhood expansion for localized updates that capture multi-scale global context. We further design a custom CUDA inference kernel optimized for memory coalescing and fine-grained parallelism. Experiments demonstrate that OctreeNCA reduces GPU memory consumption by 90% compared to U-Net, enabling full-image segmentation of 184-megapixel pathology slides or 60-second surgical videos on a single consumer-grade GPU. Crucially, it preserves global structural coherence while achieving substantial speedups and improved deployment practicality.

Technology Category

Application Category

📝 Abstract
Medical applications demand segmentation of large inputs, like prostate MRIs, pathology slices, or videos of surgery. These inputs should ideally be inferred at once to provide the model with proper spatial or temporal context. When segmenting large inputs, the VRAM consumption of the GPU becomes the bottleneck. Architectures like UNets or Vision Transformers scale very poorly in VRAM consumption, resulting in patch- or frame-wise approaches that compromise global consistency and inference speed. The lightweight Neural Cellular Automaton (NCA) is a bio-inspired model that is by construction size-invariant. However, due to its local-only communication rules, it lacks global knowledge. We propose OctreeNCA by generalizing the neighborhood definition using an octree data structure. Our generalized neighborhood definition enables the efficient traversal of global knowledge. Since deep learning frameworks are mainly developed for large multi-layer networks, their implementation does not fully leverage the advantages of NCAs. We implement an NCA inference function in CUDA that further reduces VRAM demands and increases inference speed. Our OctreeNCA segments high-resolution images and videos quickly while occupying 90% less VRAM than a UNet during evaluation. This allows us to segment 184 Megapixel pathology slices or 1-minute surgical videos at once.
Problem

Research questions and friction points this paper is trying to address.

Segmentation of large medical inputs with spatial/temporal context
High VRAM consumption in existing models limits performance
Lack of global knowledge in lightweight Neural Cellular Automata
Innovation

Methods, ideas, or system contributions that make the work stand out.

OctreeNCA uses octree for global knowledge traversal
CUDA-implemented NCA reduces VRAM and boosts speed
Size-invariant NCA enables single-pass high-res segmentation
🔎 Similar Papers