🤖 AI Summary
Early-onset scoliosis in adolescents is often asymptomatic and challenging to detect at scale: X-ray screening entails ionizing radiation exposure and relies heavily on expert interpretation. To address these limitations, we propose a non-invasive, video-based gait analysis method. First, dynamic time warping (DTW) clustering is employed to segment gait cycles into discriminative phases. Second, we introduce a text-guided multiple-instance learning framework featuring inter-bag temporal attention (IBTA) to emphasize diagnostically salient gait intervals, coupled with a boundary-aware model (BAM) to improve detection of borderline cases. Third, clinical textual knowledge is integrated with large language models (LLMs) to enhance semantic representation and interpretability. Evaluated on the Scoliosis1K dataset, our approach significantly outperforms existing methods—particularly under class imbalance and in detecting mild abnormalities—demonstrating strong potential for clinical deployment.
📝 Abstract
Early-stage scoliosis is often difficult to detect, particularly in adolescents, where delayed diagnosis can lead to serious health issues. Traditional X-ray-based methods carry radiation risks and rely heavily on clinical expertise, limiting their use in large-scale screenings. To overcome these challenges, we propose a Text-Guided Multi-Instance Learning Network (TG-MILNet) for non-invasive scoliosis detection using gait videos. To handle temporal misalignment in gait sequences, we employ Dynamic Time Warping (DTW) clustering to segment videos into key gait phases. To focus on the most relevant diagnostic features, we introduce an Inter-Bag Temporal Attention (IBTA) mechanism that highlights critical gait phases. Recognizing the difficulty in identifying borderline cases, we design a Boundary-Aware Model (BAM) to improve sensitivity to subtle spinal deviations. Additionally, we incorporate textual guidance from domain experts and large language models (LLM) to enhance feature representation and improve model interpretability. Experiments on the large-scale Scoliosis1K gait dataset show that TG-MILNet achieves state-of-the-art performance, particularly excelling in handling class imbalance and accurately detecting challenging borderline cases. The code is available at https://github.com/lhqqq/TG-MILNet