🤖 AI Summary
This work addresses key challenges in applying large language models to mental health, including the scarcity of high-quality interpretable training data, difficulties in knowledge integration, limitations of existing training paradigms, and the lack of multi-turn dialogue evaluation benchmarks. To tackle these issues, the authors propose the oMind framework, which leverages structured knowledge retrieval, model pruning, and human curation to construct a multi-task instruction-tuning dataset comprising 164,000 samples, explicitly aligned with multi-turn conversational capabilities. They also introduce oMind-Chat, the first fine-grained multi-turn dialogue benchmark for mental health, supported by a multidimensional expert scoring system. Experimental results demonstrate that the oMind-finetuned model significantly outperforms baseline approaches in both core competencies and multi-turn dialogue tasks, achieving an 80% win rate in human preference evaluations, thereby validating the efficacy of the proposed knowledge-guided fine-tuning and collaborative evaluation framework.
📝 Abstract
Large Language Models (LLMs) have shown remarkable capabilities for complex tasks, yet adaptation in medical domain, specifically mental health, poses specific challenges. Mental health is a rising concern globally with LLMs having large potential to help address the same. We highlight three primary challenges for LLMs in mental health - lack of high quality interpretable and knowledge grounded training data; training paradigms restricted to core capabilities, and evaluation of multi turn dialogue settings. Addressing it, we present oMind framework which includes training and aligning LLM agents for diverse capabilities including conversations; high quality ~164k multi-task SFT dataset, as a result of our generation pipeline based on Structured Knowledge retrieval, LLM based pruning, and review actions. We also introduce oMind-Chat - a novel multi turn benchmark dataset with expert annotated turn level and conversation level rubrics. Our diverse experiments on both core capabilities and conversations shows oMind LLMs consistently outperform baselines. oMind-LLM also shows significantly better reasoning with up to 80% win rate.