🤖 AI Summary
To address the subjectivity and poor inter-observer consistency in identifying the muscularis propria layer in Hirschsprung disease (HD) histopathological diagnosis, this study pioneers the application of Vision Transformers (ViT) to automated muscularis propria segmentation in HD pathology images—overcoming the local receptive field limitation of convolutional neural networks (CNNs) and enabling global contextual modeling. Using calretinin immunohistochemical images, the ViT model accurately localizes the myenteric plexus region, facilitating quantitative aganglionosis analysis. Evaluation employs two clinically relevant metrics: Dice coefficient and Plexus Inclusion Rate (PIR). The ViT achieves 89.9% Dice and 100% PIR, significantly outperforming CNN (89.2% Dice, 96.0% PIR) and k-means clustering (80.7% Dice, 77.4% PIR). This work establishes a high-accuracy, interpretable AI-assisted paradigm for HD pathological diagnosis.
📝 Abstract
Hirschsprung's disease (HD) is a congenital birth defect diagnosed by identifying the lack of ganglion cells within the colon's muscularis propria, specifically within the myenteric plexus regions. There may be advantages for quantitative assessments of histopathology images of the colon, such as counting the ganglion and assessing their spatial distribution; however, this would be time-intensive for pathologists, costly, and subject to inter- and intra-rater variability. Previous research has demonstrated the potential for deep learning approaches to automate histopathology image analysis, including segmentation of the muscularis propria using convolutional neural networks (CNNs). Recently, Vision Transformers (ViTs) have emerged as a powerful deep learning approach due to their self-attention. This study explores the application of ViTs for muscularis propria segmentation in calretinin-stained histopathology images and compares their performance to CNNs and shallow learning methods. The ViT model achieved a DICE score of 89.9% and Plexus Inclusion Rate (PIR) of 100%, surpassing the CNN (DICE score of 89.2%; PIR of 96.0%) and k-means clustering method (DICE score of 80.7%; PIR 77.4%). Results assert that ViTs are a promising tool for advancing HD-related image analysis.