🤖 AI Summary
Named entity recognition (NER) and health event extraction in the low-resource French biomedical domain suffer from severe annotation scarcity. Method: This paper proposes an LLM-driven collaborative framework integrating automatic exemplar selection, annotation guideline summarization injected into prompts, synthetic data-augmented fine-tuning (leveraging GLiNER and LLaMA-3.1-8B-Instruct), and LLM-based post-validation. Contribution/Results: The approach tightly couples structured domain knowledge (guidelines), high-quality synthetic data, and multi-stage LLM reasoning to substantially alleviate annotation bottlenecks. Under extreme few-shot settings, GPT-4.1 achieves 61.53% macro-F1 on NER and 15.02% F1 on health event extraction via in-context learning—demonstrating the efficacy of synergistic prompt engineering and post-processing. The framework establishes a reusable methodological paradigm for information extraction in low-resource specialized domains.
📝 Abstract
This work presents our participation in the EvalLLM 2025 challenge on biomedical Named Entity Recognition (NER) and health event extraction in French (few-shot setting). For NER, we propose three approaches combining large language models (LLMs), annotation guidelines, synthetic data, and post-processing: (1) in-context learning (ICL) with GPT-4.1, incorporating automatic selection of 10 examples and a summary of the annotation guidelines into the prompt, (2) the universal NER system GLiNER, fine-tuned on a synthetic corpus and then verified by an LLM in post-processing, and (3) the open LLM LLaMA-3.1-8B-Instruct, fine-tuned on the same synthetic corpus. Event extraction uses the same ICL strategy with GPT-4.1, reusing the guideline summary in the prompt. Results show GPT-4.1 leads with a macro-F1 of 61.53% for NER and 15.02% for event extraction, highlighting the importance of well-crafted prompting to maximize performance in very low-resource scenarios.