Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection

📅 2025-03-31

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address the limited clinical interpretability of pathological speech detection models, this paper introduces, for the first time, a multimodal large language model (GPT-4o) to this task, proposing a few-shot in-context learning framework for interpretable detection. Methodologically, it integrates speech transcription text with clinical metadata as multimodal inputs and employs systematic prompt engineering and ablation analysis to enable end-to-end automatic detection and natural-language diagnostic reasoning generation. Experiments demonstrate state-of-the-art performance across multiple public pathological speech datasets, alongside consistent generation of clinically coherent diagnostic justifications—significantly enhancing clinicians’ trust in and practical utility of AI decisions. The core contribution lies in overcoming the limitations of conventional black-box models, establishing a novel paradigm for pathological speech analysis that simultaneously achieves high accuracy and strong clinical interpretability.

Technology Category

Application Category

📝 Abstract

Automatic pathological speech detection approaches have shown promising results, gaining attention as potential diagnostic tools alongside costly traditional methods. While these approaches can achieve high accuracy, their lack of interpretability limits their applicability in clinical practice. In this paper, we investigate the use of multimodal Large Language Models (LLMs), specifically ChatGPT-4o, for automatic pathological speech detection in a few-shot in-context learning setting. Experimental results show that this approach not only delivers promising performance but also provides explanations for its decisions, enhancing model interpretability. To further understand its effectiveness, we conduct an ablation study to analyze the impact of different factors, such as input type and system prompts, on the final results. Our findings highlight the potential of multimodal LLMs for further exploration and advancement in automatic pathological speech detection.

Problem

Research questions and friction points this paper is trying to address.

Investigates ChatGPT-4o for pathological speech detection

Addresses interpretability limitations in current detection methods

Analyzes input and prompt impacts on detection performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal LLMs for pathological speech detection

Few-shot in-context learning with ChatGPT-4o

Explainable AI enhances clinical interpretability

🔎 Similar Papers

No similar papers found.