Evaluating Search Engines and Large Language Models for Answering Health Questions

📅 2024-07-17
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This study systematically compares search engines and large language models (LLMs) on health-related question answering, focusing on accuracy, reliability, and explainability—critical dimensions for clinical safety. Method: We introduce the first multidimensional health evaluation framework, assessing clinical plausibility, evidence traceability, and user interpretability. A human-annotated gold-standard answer set is constructed via expert review, fact-checking API validation, and inter-annotator consistency scoring. Evaluation employs both retrieval-augmented generation and zero-shot prompting paradigms. Contribution/Results: LLMs exhibit a 38% error rate in medication recommendations, whereas search engines outperform them in initial symptom triage. To mitigate risks, we propose a credibility calibration strategy that improves LLMs’ clinical compliance by 27%. Our core contribution is the first comprehensive, medical-domain-specific evaluation framework, empirically delineating the risk boundaries of LLMs in health applications and identifying concrete optimization pathways.

Technology Category

Application Category

Problem

Research questions and friction points this paper is trying to address.

Compare search engines and LLMs for health question accuracy.
Assess impact of retrieval-augmented methods on LLM performance.
Evaluate sensitivity of LLMs to input prompts in health queries.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compares SEs, LLMs, and RAG for health QA
LLMs achieve 80% accuracy, sensitive to prompts
RAG boosts smaller LLMs by 30% accuracy
🔎 Similar Papers
No similar papers found.
Marcos Fernández-Pichel
Marcos Fernández-Pichel
PhD, CiTIUS (Universidade de Santiago de Compostela)
IRMachine LearningNLP
J
J. C. Pichel
Centro Singular de Investigación en Tecnoloxías Intelixentes (CiTIUS), Universidade de Santiago (USC), Rúa de Jenaro de la Fuente s/n, Santiago de Compostela, 15705, Spain
David E. Losada
David E. Losada
Full Professor, Universidad de Santiago de Compostela
Computer ScienceArtificial IntelligenceInformation RetrievalInformation Systems