CareLab at #SMM4H-HeaRD 2025: Insomnia Detection and Food Safety Event Extraction with Domain-Aware Transformers

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses two cross-domain information extraction tasks: insomnia symptom detection in clinical texts and food safety incident extraction in news articles. Methodologically, we propose a domain-aware Transformer framework built upon RoBERTa-style encoders, integrating three key innovations: (1) GPT-4–driven large language model (LLM) data augmentation, (2) task-adaptive input construction, and (3) domain-adaptive fine-tuning. These components collectively enhance domain-specific semantic modeling and few-shot generalization over generic models. Evaluated on SMM4H-HeaRD 2025 Task 5 Subtask 1 (food safety incident extraction), our approach achieves an F1 score of 0.958—ranking first—and attains state-of-the-art performance on the insomnia detection subtask. Our work empirically validates the efficacy of synergistic LLM augmentation and domain-informed encoding, establishing a reusable technical paradigm for low-resource information extraction in healthcare and public safety domains.

Technology Category

Application Category

📝 Abstract
This paper presents our system for the SMM4H-HeaRD 2025 shared tasks, specifically Task 4 (Subtasks 1, 2a, and 2b) and Task 5 (Subtasks 1 and 2). Task 4 focused on detecting mentions of insomnia in clinical notes, while Task 5 addressed the extraction of food safety events from news articles. We participated in all subtasks and report key findings across them, with particular emphasis on Task 5 Subtask 1, where our system achieved strong performance-securing first place with an F1 score of 0.958 on the test set. To attain this result, we employed encoder-based models (e.g., RoBERTa), alongside GPT-4 for data augmentation. This paper outlines our approach, including preprocessing, model architecture, and subtask-specific adaptations
Problem

Research questions and friction points this paper is trying to address.

Detect insomnia mentions in clinical notes
Extract food safety events from news articles
Improve performance using domain-aware transformer models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Domain-aware transformers for insomnia detection
RoBERTa and GPT-4 for data augmentation
Preprocessing and model architecture adaptations
🔎 Similar Papers
No similar papers found.