LongBEL: Long-Context and Document-Consistent Biomedical Entity Linking

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses the limitation of existing biomedical entity linking approaches, which often neglect dependencies among mentions within a document, leading to inconsistent predictions for the same concept expressed in different surface forms. To resolve this, the authors propose LongBEL, a document-level generative entity linking framework that integrates local context, global document information, and a memory-augmented mechanism. The memory module is trained via cross-validated predictions to mitigate train–test mismatch and reduce cascading errors. LongBEL is the first method to support multilingual document-level consistent linking across English, French, and Spanish. Evaluated on five benchmark datasets, it substantially outperforms sentence-level baselines, demonstrating particularly strong performance in scenarios with frequent entity reoccurrence, and its ensemble variant achieves state-of-the-art results.

📝 Abstract

Biomedical entity linking maps textual mentions to concepts in structured knowledge bases such as UMLS or SNOMED CT. Most existing systems link each mention independently, using only the mention or its surrounding sentence. This ignores dependencies between mentions in the same document and can lead to inconsistent predictions, especially when the same concept appears under different surface forms. We introduce LongBEL, a document-level generative framework that combines full-document context with a memory of previous predictions. To make this memory robust, LongBEL is trained with cross-validated predictions rather than gold labels, reducing the mismatch between training and inference and limiting cascading errors. Experiments on five biomedical benchmarks across English, French, and Spanish show that LongBEL improves over sentence-level generative baselines, with the largest gains on datasets where concepts frequently recur within documents. An ensemble of local, global, and memory-based variants achieves the best results across all benchmarks. Further analysis shows that the largest gains occur on recurring concepts, suggesting that LongBEL mainly improves document-level consistency rather than isolated mention disambiguation.

Problem

Research questions and friction points this paper is trying to address.

biomedical entity linking

document-level consistency

mention dependencies

concept recurrence

inconsistent predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

document-level entity linking

generative framework

memory-augmented prediction