🤖 AI Summary
To address insufficient cell-level fine-grained feature modeling in histopathological image cancer subtype classification, this paper proposes a dual-stream recurrent Transformer architecture. One stream leverages a foundation model to extract global tissue context, while the other employs a recurrent Transformer with receptance-weighted key-value aggregation to model inter-cellular dependencies at linear computational complexity. A tissue–cell bidirectional attention module enables cross-scale semantic alignment between the two streams. This work is the first to integrate efficient recurrent modeling with bidirectional multi-scale fusion for histopathological analysis. Evaluated on four cancer subtype classification benchmarks, our method significantly outperforms state-of-the-art approaches. Results demonstrate that effective cell-level feature aggregation and synergistic tissue–cell modeling are critical for fine-grained pathological diagnosis.
📝 Abstract
Accurate interpretation of histopathological images demands integration of information across spatial and semantic scales, from nuclear morphology and cellular textures to global tissue organization and disease-specific patterns. Although recent foundation models in pathology have shown strong capabilities in capturing global tissue context, their omission of cell-level feature modeling remains a key limitation for fine-grained tasks such as cancer subtype classification. To address this, we propose a dual-stream architecture that models the interplay between macroscale tissue features and aggregated cellular representations. To efficiently aggregate information from large cell sets, we propose a receptance-weighted key-value aggregation model, a recurrent transformer that captures inter-cell dependencies with linear complexity. Furthermore, we introduce a bidirectional tissue-cell interaction module to enable mutual attention between localized cellular cues and their surrounding tissue environment. Experiments on four histopathological subtype classification benchmarks show that the proposed method outperforms existing models, demonstrating the critical role of cell-level aggregation and tissue-cell interaction in fine-grained computational pathology.