Error-Aware Curriculum Learning for Biomedical Relation Classification

📅 2025-07-18

📈 Citations: 0

✨ Influential: 0

career value

123K/year

🤖 AI Summary

Biomedical relation classification (RC) suffers from semantic complexity, leading to erroneous predictions that hinder knowledge graph construction and drug repurposing. To address this, we propose an error-aware teacher-student framework: GPT-4o serves as the teacher model to automatically diagnose error types, assess annotation difficulty, and generate sentence rewrites and knowledge-enhanced suggestions. This work is the first to integrate fine-grained error-type analysis with difficulty-aware curriculum learning. We further construct a heterogeneous biomedical knowledge graph to strengthen contextual modeling. Our method comprises instruction tuning, knowledge-graph-guided classification, and sentence-level data augmentation. Evaluated on five protein–protein interaction (PPI) datasets and one drug–drug interaction (DDI) dataset, our approach achieves state-of-the-art performance on four PPI benchmarks and the DDI dataset, while attaining competitive results on ChemProt. It significantly improves model robustness and generalization across diverse biomedical relation extraction tasks.

Technology Category

Application Category

📝 Abstract

Relation Classification (RC) in biomedical texts is essential for constructing knowledge graphs and enabling applications such as drug repurposing and clinical decision-making. We propose an error-aware teacher--student framework that improves RC through structured guidance from a large language model (GPT-4o). Prediction failures from a baseline student model are analyzed by the teacher to classify error types, assign difficulty scores, and generate targeted remediations, including sentence rewrites and suggestions for KG-based enrichment. These enriched annotations are used to train a first student model via instruction tuning. This model then annotates a broader dataset with difficulty scores and remediation-enhanced inputs. A second student is subsequently trained via curriculum learning on this dataset, ordered by difficulty, to promote robust and progressive learning. We also construct a heterogeneous biomedical knowledge graph from PubMed abstracts to support context-aware RC. Our approach achieves new state-of-the-art performance on 4 of 5 PPI datasets and the DDI dataset, while remaining competitive on ChemProt.

Problem

Research questions and friction points this paper is trying to address.

Improving biomedical relation classification via error-aware learning

Enhancing knowledge graphs with context-aware relation classification

Achieving state-of-the-art performance on biomedical datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Error-aware teacher-student framework with GPT-4o

Curriculum learning with difficulty-based remediation

Heterogeneous knowledge graph from PubMed abstracts

🔎 Similar Papers

No similar papers found.