CareLab at #SMM4H-HeaRD 2025: Insomnia Detection and Food Safety Event Extraction with Domain-Aware Transformers

📅 2025-06-22

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study addresses two cross-domain information extraction tasks: insomnia symptom detection in clinical texts and food safety incident extraction in news articles. Methodologically, we propose a domain-aware Transformer framework built upon RoBERTa-style encoders, integrating three key innovations: (1) GPT-4–driven large language model (LLM) data augmentation, (2) task-adaptive input construction, and (3) domain-adaptive fine-tuning. These components collectively enhance domain-specific semantic modeling and few-shot generalization over generic models. Evaluated on SMM4H-HeaRD 2025 Task 5 Subtask 1 (food safety incident extraction), our approach achieves an F1 score of 0.958—ranking first—and attains state-of-the-art performance on the insomnia detection subtask. Our work empirically validates the efficacy of synergistic LLM augmentation and domain-informed encoding, establishing a reusable technical paradigm for low-resource information extraction in healthcare and public safety domains.

Technology Category

Application Category

📝 Abstract

This paper presents our system for the SMM4H-HeaRD 2025 shared tasks, specifically Task 4 (Subtasks 1, 2a, and 2b) and Task 5 (Subtasks 1 and 2). Task 4 focused on detecting mentions of insomnia in clinical notes, while Task 5 addressed the extraction of food safety events from news articles. We participated in all subtasks and report key findings across them, with particular emphasis on Task 5 Subtask 1, where our system achieved strong performance-securing first place with an F1 score of 0.958 on the test set. To attain this result, we employed encoder-based models (e.g., RoBERTa), alongside GPT-4 for data augmentation. This paper outlines our approach, including preprocessing, model architecture, and subtask-specific adaptations

Problem

Research questions and friction points this paper is trying to address.

Detect insomnia mentions in clinical notes

Extract food safety events from news articles

Improve performance using domain-aware transformer models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-aware transformers for insomnia detection

RoBERTa and GPT-4 for data augmentation

Preprocessing and model architecture adaptations

🔎 Similar Papers

No similar papers found.