PERCS: Persona-Guided Controllable Biomedical Summarization Dataset

📅 2025-12-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing biomedical text simplification research predominantly targets generic audiences, neglecting the heterogeneity in users’ health literacy and information needs. Method: We introduce the first controllable biomedical summarization dataset tailored to four distinct audience personas—laypersons, medical students, non-medical researchers, and medical experts—and propose a novel persona-aware summarization framework, validated by clinical experts for factual accuracy and audience appropriateness. Quality assurance integrates human annotation, expert review, and fine-grained error categorization, complemented by LLM-based automated evaluation of readability, completeness, and faithfulness. Contribution/Results: We publicly release a multi-level summary dataset, annotation guidelines, and an evaluation benchmark. Empirical analysis confirms statistically significant differences across personas in lexical choice, knowledge depth, and readability, establishing a reproducible, controllable simplification baseline.

Technology Category

Application Category

📝 Abstract
Automatic medical text simplification plays a key role in improving health literacy by making complex biomedical research accessible to diverse readers. However, most existing resources assume a single generic audience, overlooking the wide variation in medical literacy and information needs across user groups. To address this limitation, we introduce PERCS (Persona-guided Controllable Summarization), a dataset of biomedical abstracts paired with summaries tailored to four personas: Laypersons, Premedical Students, Non-medical Researchers, and Medical Experts. These personas represent different levels of medical literacy and information needs, emphasizing the need for targeted, audience-specific summarization. Each summary in PERCS was reviewed by physicians for factual accuracy and persona alignment using a detailed error taxonomy. Technical validation shows clear differences in readability, vocabulary, and content depth across personas. Along with describing the dataset, we benchmark four large language models on PERCS using automatic evaluation metrics that assess comprehensiveness, readability, and faithfulness, establishing baseline results for future research. The dataset, annotation guidelines, and evaluation materials are publicly available to support research on persona-specific communication and controllable biomedical summarization.
Problem

Research questions and friction points this paper is trying to address.

Develops a dataset for persona-specific biomedical text summarization
Addresses lack of audience-tailored medical simplification resources
Evaluates models on readability and accuracy for diverse user groups
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset with persona-tailored biomedical summaries
Physician-reviewed summaries using error taxonomy
Benchmarked large language models with evaluation metrics
🔎 Similar Papers
No similar papers found.
R
Rohan Charudatt Salvi
Department of Computer Science, University of Illinois, Chicago, IL, 60607, USA
C
Chirag Chawla
Indian Institute of Technology, Varanasi, India
Dhruv Jain
Dhruv Jain
Assistant Professor at University of Michigan
Human-Computer InteractionAccessible ComputingHuman-Centered AIDeaf and Hard of Hearing
S
Swapnil Panigrahi
Department of Computer Science & Engineering, Indraprastha Institute of Information Technology, Delhi, New Delhi, 110020, India
M
Md. Shad Akhtar
Department of Computer Science & Engineering, Indraprastha Institute of Information Technology, Delhi, New Delhi, 110020, India
S
Shweta Yadav
Department of Computer Science, University of Illinois, Chicago, IL, 60607, USA