🤖 AI Summary
To address the limitations of sliding-window approaches—neglecting long-range tissue context—and large-field-of-view (FoV) cropping—inducing high inference latency—in whole-slide image (WSI) nucleus detection, this paper proposes ContextNucleus. Our method introduces a novel temporal aggregation mechanism that fuses historical sliding-window features, eliminating the need for computationally expensive large-FoV cropping. We design Grid Pooling to compress dense features into sparse contextual tokens and integrate high-magnification local modeling with token injection for context-aware nucleus instance segmentation. We further establish OCELOT-seg, the first context-aware nucleus instance segmentation benchmark tailored for WSIs. Extensive experiments across multiple WSI datasets demonstrate state-of-the-art performance, achieving significant improvements in both detection and segmentation accuracy while reducing average inference latency by 37%. The code, dataset, and pretrained models are publicly released.
📝 Abstract
Nucleus detection in histopathology whole slide images (WSIs) is crucial for a broad spectrum of clinical applications. Current approaches for nucleus detection in gigapixel WSIs utilize a sliding window methodology, which overlooks boarder contextual information (eg, tissue structure) and easily leads to inaccurate predictions. To address this problem, recent studies additionally crops a large Filed-of-View (FoV) region around each sliding window to extract contextual features. However, such methods substantially increases the inference latency. In this paper, we propose an effective and efficient context-aware nucleus detection algorithm. Specifically, instead of leveraging large FoV regions, we aggregate contextual clues from off-the-shelf features of historically visited sliding windows. This design greatly reduces computational overhead. Moreover, compared to large FoV regions at a low magnification, the sliding window patches have higher magnification and provide finer-grained tissue details, thereby enhancing the detection accuracy. To further improve the efficiency, we propose a grid pooling technique to compress dense feature maps of each patch into a few contextual tokens. Finally, we craft OCELOT-seg, the first benchmark dedicated to context-aware nucleus instance segmentation. Code, dataset, and model checkpoints will be available at https://github.com/windygoo/PathContext.