Are Today's LLMs Ready to Explain Well-Being Concepts?

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates whether large language models (LLMs) can generate accurate, audience-adapted explanations of “happiness”—a core interdisciplinary concept—tailored to users with varying domain expertise (e.g., mental health, physical well-being, social welfare). Method: We construct a high-quality dataset of 43,880 explanations spanning these three domains and propose a principle-guided dual-LLM adjudication framework for automated evaluation, achieving high consistency with human judgments (ρ > 0.9). We further pioneer the application of supervised fine-tuning (SFT) and direct preference optimization (DPO) to specialized conceptual explanation tasks. Contribution/Results: Fine-tuned smaller models significantly outperform unadjusted larger models; systematic performance disparities emerge across audience types and domain categories. Our work establishes a methodological foundation and empirical evidence for the trustworthy deployment of LLMs in explaining humanistic and social-scientific concepts.

Technology Category

Application Category

📝 Abstract
Well-being encompasses mental, physical, and social dimensions essential to personal growth and informed life decisions. As individuals increasingly consult Large Language Models (LLMs) to understand well-being, a key challenge emerges: Can LLMs generate explanations that are not only accurate but also tailored to diverse audiences? High-quality explanations require both factual correctness and the ability to meet the expectations of users with varying expertise. In this work, we construct a large-scale dataset comprising 43,880 explanations of 2,194 well-being concepts, generated by ten diverse LLMs. We introduce a principle-guided LLM-as-a-judge evaluation framework, employing dual judges to assess explanation quality. Furthermore, we show that fine-tuning an open-source LLM using Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) can significantly enhance the quality of generated explanations. Our results reveal: (1) The proposed LLM judges align well with human evaluations; (2) explanation quality varies significantly across models, audiences, and categories; and (3) DPO- and SFT-finetuned models outperform their larger counterparts, demonstrating the effectiveness of preference-based learning for specialized explanation tasks.
Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' ability to explain well-being concepts accurately
Evaluating tailored explanations for diverse audience expertise levels
Improving explanation quality via fine-tuning and preference-based learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale dataset with 43,880 well-being explanations
Principle-guided LLM-as-a-judge dual evaluation framework
SFT and DPO fine-tuning enhances explanation quality
🔎 Similar Papers
No similar papers found.