Labels Generated by Large Language Model Helps Measuring People's Empathy in Vitro

📅 2025-01-01

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In empathic computing, noisy labels in web-sourced data severely hinder model performance. To address this, we propose a large language model (LLM)-enhanced supervised training framework: the first to leverage LLM-as-a-Service for *in vitro* label generation—bypassing conventional *in vivo* prompt-based direct inference. Our method jointly employs RoBERTa-based encoders, GPT-4 for high-fidelity label synthesis, statistical noise correction, and multi-stage fine-tuning, with Pearson correlation coefficient as the primary evaluation metric. Evaluated on the NewsEmp benchmark, our approach achieves a new state-of-the-art score of 0.648—significantly outperforming all baselines. To foster reproducibility and community advancement, we publicly release both source code and the LLM-generated dataset.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have revolutionised numerous fields, with LLM-as-a-service (LLMSaaS) having a strong generalisation ability that offers accessible solutions directly without the need for costly training. In contrast to the widely studied prompt engineering for task solving directly (in vivo), this paper explores its potential in in-vitro applications. These involve using LLM to generate labels to help the supervised training of mainstream models by (1) noisy label correction and (2) training data augmentation with LLM-generated labels. In this paper, we evaluate this approach in the emerging field of empathy computing -- automating the prediction of psychological questionnaire outcomes from inputs like text sequences. Specifically, crowdsourced datasets in this domain often suffer from noisy labels that misrepresent underlying empathy. By leveraging LLM-generated labels to train pre-trained language models (PLMs) like RoBERTa, we achieve statistically significant accuracy improvements over baselines, achieving a state-of-the-art Pearson correlation coefficient of 0.648 on NewsEmp benchmarks. In addition, we bring insightful discussions, including current challenges in empathy computing, data biases in training data and evaluation metric selection. Code and LLM-generated data are available at https://github.com/hasan-rakibul/LLMPathy (available once the paper is accepted).

Problem

Research questions and friction points this paper is trying to address.

Emotional Computing

Label Noise

Online Data

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Emotion Computing

Label Refinement

🔎 Similar Papers

No similar papers found.

Authors to Follow