🤖 AI Summary
This study investigates automatic detection of cognitive impairment using speech transcripts across English, Slovenian, and Korean. We systematically compare zero-shot large language models against supervised tabular models that integrate handcrafted linguistic features and textual embeddings, evaluating performance under various input configurations—text only, features only, and their fusion. Results demonstrate that in low-resource settings, structured linguistic features combined with supervised classifiers yield more stable and reliable performance than zero-shot large models. The utility of limited annotated data exhibits language-specific variability, and the highest accuracy is achieved through early or late fusion of linguistic features and embeddings. Nevertheless, zero-shot large language models serve as strong baselines, highlighting their potential in multilingual cognitive assessment scenarios.
📝 Abstract
We evaluate cognitive impairment (CI) classification from transcripts of speech in English, Slovene, and Korean. We compare zero-shot large language models (LLMs) used as direct classifiers under three input settings -- transcript-only, linguistic-features-only, and combined -- with supervised tabular approaches trained under a leave-one-out protocol. The tabular models operate on engineered linguistic features, transcript embeddings, and early or late fusion of both modalities. Across languages, zero-shot LLMs provide competitive no-training baselines, but supervised tabular models generally perform better, particularly when engineered linguistic features are included and combined with embeddings. Few-shot experiments focusing on embeddings indicate that the value of limited supervision is language-dependent, with some languages benefiting substantially from additional labelled examples while others remain constrained without richer feature representations. Overall, the results suggest that, in small-data CI detection, structured linguistic signals and simple fusion-based classifiers remain strong and reliable signals.