Technical Report on classification of literature related to children speech disorder

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

The rapid growth of literature on childhood speech disorders impedes efficient manual systematic reviews. Method: We developed an NLP-driven automated classification system leveraging 4,804 PubMed-indexed publications from 2015 onward. Our approach innovatively integrates BERTopic and LDA topic modeling, augmented by a domain-specific stopword list tailored to speech-language pathology, thereby enhancing clinical interpretability and classification accuracy. Contribution/Results: The system identifies 14 clinically meaningful topic clusters. LDA achieves a coherence score of 0.42 and perplexity of −7.5; BERTopic yields an outlier topic proportion below 20%. This work represents the first synergistic application of BERTopic and LDA in speech pathology literature analysis, delivering a reproducible, high-fidelity, and clinically grounded automation framework for evidence-based practice and knowledge graph construction.

Technology Category

Application Category

📝 Abstract

This technical report presents a natural language processing (NLP)-based approach for systematically classifying scientific literature on childhood speech disorders. We retrieved and filtered 4,804 relevant articles published after 2015 from the PubMed database using domain-specific keywords. After cleaning and pre-processing the abstracts, we applied two topic modeling techniques - Latent Dirichlet Allocation (LDA) and BERTopic - to identify latent thematic structures in the corpus. Our models uncovered 14 clinically meaningful clusters, such as infantile hyperactivity and abnormal epileptic behavior. To improve relevance and precision, we incorporated a custom stop word list tailored to speech pathology. Evaluation results showed that the LDA model achieved a coherence score of 0.42 and a perplexity of -7.5, indicating strong topic coherence and predictive performance. The BERTopic model exhibited a low proportion of outlier topics (less than 20%), demonstrating its capacity to classify heterogeneous literature effectively. These results provide a foundation for automating literature reviews in speech-language pathology.

Problem

Research questions and friction points this paper is trying to address.

Classifying scientific literature on childhood speech disorders using NLP

Identifying latent thematic structures in speech disorder research articles

Automating literature reviews in speech-language pathology via topic modeling

Innovation

Methods, ideas, or system contributions that make the work stand out.

NLP-based approach for literature classification

LDA and BERTopic for topic modeling

Custom stop word list for precision

🔎 Similar Papers

No similar papers found.