🤖 AI Summary
The proliferation of online biomedical literature has exacerbated public comprehension barriers, primarily due to complex domain-specific terminology. Existing simplification corpora lack fine-grained annotations, hindering targeted modeling and rigorous evaluation. To address this, we propose JEBS—a novel, term-level biomedical simplification task—introducing the first fine-grained framework that decomposes simplification into three sequential stages: (1) identification of complex biomedical terms, (2) classification of replacement strategies along semantic, syntactic, or explanatory dimensions, and (3) generation of simplified text. We construct and publicly release the JEBS dataset, comprising 21,595 annotated term replacements across 10,314 unique terms and 400 abstracts. Additionally, we design a multi-stage model integrating rule-based matching with BERT and T5. Extensive experiments establish strong baselines and advance interpretable, evaluable biomedical terminology simplification research.
📝 Abstract
Online medical literature has made health information more available than ever, however, the barrier of complex medical jargon prevents the general public from understanding it. Though parallel and comparable corpora for Biomedical Text Simplification have been introduced, these conflate the many syntactic and lexical operations involved in simplification. To enable more targeted development and evaluation, we present a fine-grained lexical simplification task and dataset, Jargon Explanations for Biomedical Simplification (JEBS, https://github.com/bill-from-ri/JEBS-data ). The JEBS task involves identifying complex terms, classifying how to replace them, and generating replacement text. The JEBS dataset contains 21,595 replacements for 10,314 terms across 400 biomedical abstracts and their manually simplified versions. Additionally, we provide baseline results for a variety of rule-based and transformer-based systems for the three sub-tasks. The JEBS task, data, and baseline results pave the way for development and rigorous evaluation of systems for replacing or explaining complex biomedical terms.