OnCoCo 1.0: A Public Dataset for Fine-Grained Message Classification in Online Counseling Conversations

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

Existing Motivational Interviewing (MI) classification frameworks rely on face-to-face interaction data and lack the granularity required for fine-grained analysis of online text-based counseling dialogues. Method: We introduce the first publicly available, fine-grained discourse act classification dataset specifically designed for online psychological counseling, comprising 38 counselor and 28 client dialogue act categories, with ~2,800 manually annotated and multi-round verified samples. We propose an integrative bidirectional encoding framework that overcomes traditional MI’s unidirectional and coarse-grained limitations, enabling high-granularity, role-symmetric modeling of online counseling dialogues. Contribution/Results: Fine-tuning BERT and RoBERTa on our dataset yields substantial performance gains in fine-grained classification—outperforming general-purpose dialogue benchmarks. The dataset and code are fully open-sourced, addressing a critical resource gap in mental health–focused NLP research.

Technology Category

Application Category

📝 Abstract

This paper presents OnCoCo 1.0, a new public dataset for fine-grained message classification in online counseling. It is based on a new, integrative system of categories, designed to improve the automated analysis of psychosocial online counseling conversations. Existing category systems, predominantly based on Motivational Interviewing (MI), are limited by their narrow focus and dependence on datasets derived mainly from face-to-face counseling. This limits the detailed examination of textual counseling conversations. In response, we developed a comprehensive new coding scheme that differentiates between 38 types of counselor and 28 types of client utterances, and created a labeled dataset consisting of about 2.800 messages from counseling conversations. We fine-tuned several models on our dataset to demonstrate its applicability. The data and models are publicly available to researchers and practitioners. Thus, our work contributes a new type of fine-grained conversational resource to the language resources community, extending existing datasets for social and mental-health dialogue analysis.

Problem

Research questions and friction points this paper is trying to address.

Develops a fine-grained classification dataset for online counseling messages

Addresses limitations of existing systems focused on face-to-face counseling

Provides a resource for automated analysis of psychosocial counseling conversations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed a comprehensive coding scheme for 66 utterance types

Created a public dataset of 2,800 labeled counseling messages

Fine-tuned models to demonstrate dataset applicability

🔎 Similar Papers

No similar papers found.