AraHealthQA 2025 Shared Task Description Paper

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

A critical shortage of high-quality Arabic medical question-answering (QA) resources exists—particularly in mental health (e.g., anxiety, depression, anti-stigmatization) and clinical domains such as internal medicine and pediatrics. Method: We introduce MedArabiQ, the first comprehensive shared task for Arabic medical QA, comprising two tracks: MentalQA (focused on mental health) and MedArabiQ (broad medical QA). It innovatively incorporates multi-turn QA and question rewriting subtasks, and employs a rigorous dual-review protocol combining expert annotators and clinical specialists to ensure cultural appropriateness, real-world scenario fidelity, and standardized evaluation metrics. Contribution/Results: We release a high-quality, expert-validated dataset; attract diverse participating teams; establish strong baselines; and—through systematic evaluation—first reveal fundamental performance bottlenecks of current LLMs in Arabic medical QA. MedArabiQ provides an authoritative, open benchmark and platform to advance multilingual medical AI research and evaluation.

Technology Category

Application Category

📝 Abstract

We introduce {AraHealthQA 2025}, the {Comprehensive Arabic Health Question Answering Shared Task}, held in conjunction with {ArabicNLP 2025} (co-located with EMNLP 2025). This shared task addresses the paucity of high-quality Arabic medical QA resources by offering two complementary tracks: {MentalQA}, focusing on Arabic mental health Q&A (e.g., anxiety, depression, stigma reduction), and {MedArabiQ}, covering broader medical domains such as internal medicine, pediatrics, and clinical decision making. Each track comprises multiple subtasks, evaluation datasets, and standardized metrics, facilitating fair benchmarking. The task was structured to promote modeling under realistic, multilingual, and culturally nuanced healthcare contexts. We outline the dataset creation, task design and evaluation framework, participation statistics, baseline systems, and summarize the overall outcomes. We conclude with reflections on the performance trends observed and prospects for future iterations in Arabic health QA.

Problem

Research questions and friction points this paper is trying to address.

Addressing scarcity of Arabic medical QA resources

Focusing on mental health and broader medical domains

Promoting modeling in realistic multilingual healthcare contexts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Arabic medical QA shared task

Mental health and general medical tracks

Multilingual culturally nuanced healthcare modeling

🔎 Similar Papers

No similar papers found.