Divide-Prompt-Refine: a Training-Free, Structure-Aware Framework for Biomedical Abstract Generation

📅 2026-05-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

182K/year
🤖 AI Summary
This work addresses the challenge posed by the absence of abstracts in a vast number of biomedical full-text articles, which hinders their utility in information retrieval and knowledge discovery. The authors propose DPR-BAG, a framework that generates structured abstracts in a zero-shot, no-fine-tuning setting. DPR-BAG first segments the full text according to the BOMRC rhetorical structure, then leverages a large language model to generate section-wise summaries in parallel, followed by a global semantic refinement step to restore coherence. Crucially, it integrates structure-awareness with a controlled prompting mechanism to prevent overly complex prompts from compromising factual consistency. Evaluated on the PMC-MAD dataset comprising 46,309 articles, DPR-BAG significantly outperforms strong extractive and fine-tuned baselines in abstract novelty while maintaining high factual consistency.
📝 Abstract
Biomedical abstracts play a critical role in downstream NLP applications, such as information retrieval, biocuration, and biomedical knowledge discovery. However, a non-trivial number of biomedical articles do not have abstracts, diminishing the utility of these articles for downstream tasks. We propose DPR-BAG (Divide, Prompt, and Refine for Biomedical Abstract Generation), a training-free, zero-shot framework that generates coherent and factually grounded abstracts for biomedical articles with full text but no abstract. DPR-BAG decomposes full-text documents into structured rhetorical facets following the Background-Objective-Methods-Results-Conclusions (BOMRC) schema, performs parallel LLM-based summarization for each facet, and applies a final refinement stage to restore global discourse coherence. On PMC-MAD, a distribution-aligned dataset of 46,309 biomedical articles, DPR-BAG improves abstractive novelty over strong extractive and fine-tuned baselines, while maintaining factual consistency. Our ablation study reveals a counterintuitive finding: increasing prompt complexity or explicitly injecting entity-level guidance can degrade factual alignment, highlighting the importance of controlled prompting strategies. These findings underscore the potential of training-free, structure-aware frameworks for scalable biomedical abstract generation in low-resource settings. Our data and code are available at https://huggingface.co/datasets/pmc-mad/PMC-MAD and https://github.com/ScienceNLP-Lab/MultiTagger-v2/tree/main/DPR-BAG.
Problem

Research questions and friction points this paper is trying to address.

biomedical abstract generation
missing abstracts
downstream NLP applications
factual consistency
low-resource settings
Innovation

Methods, ideas, or system contributions that make the work stand out.

training-free
structure-aware
biomedical abstract generation
zero-shot
rhetorical facet decomposition
🔎 Similar Papers
No similar papers found.
S
Sylvey Lin
School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL
J
Joe Menke
School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL
S
Shufan Ming
School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL
D
Dongin Nam
School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL
N
Neil Smalheiser
School of Information Sciences, University of Illinois Urbana-Champaign, Champaign, IL; Department of Psychiatry, University of Illinois College of Medicine, Chicago, IL
Halil Kilicoglu
Halil Kilicoglu
Associate Professor, University of Illinois at Urbana-Champaign
Natural Language ProcessingInformation ExtractionBiomedical InformaticsComputational SemanticsQuestion Answering