🤖 AI Summary
This study addresses the pervasive underreporting of gender-based violence (GBV) incidents in primary care electronic health records (EHRs). We propose a transparent, efficient, and language-agnostic semantic-driven NLP framework. Methodologically, we construct eight fine-grained GBV retrieval patterns grounded in semantic role labeling and apply them to automatically identify potential underreported cases from 21 million sentences of open Portuguese-language text (Brazilian variant), followed by expert validation. Our key contributions are threefold: (1) the first application of an interpretable semantic framework to public health surveillance—ensuring model traceability, ethical compliance, and low-carbon computation; (2) elimination of reliance on large-scale annotated data; and (3) inherent cross-lingual transferability. Evaluation yields a precision of 0.726, significantly improving GBV detection rates. Results demonstrate robustness, scalability, and practical utility in real-world clinical settings.
📝 Abstract
We introduce a methodology for the identification of notifiable events in the domain of healthcare. The methodology harnesses semantic frames to define fine-grained patterns and search them in unstructured data, namely, open-text fields in e-medical records. We apply the methodology to the problem of underreporting of gender-based violence (GBV) in e-medical records produced during patients' visits to primary care units. A total of eight patterns are defined and searched on a corpus of 21 million sentences in Brazilian Portuguese extracted from e-SUS APS. The results are manually evaluated by linguists and the precision of each pattern measured. Our findings reveal that the methodology effectively identifies reports of violence with a precision of 0.726, confirming its robustness. Designed as a transparent, efficient, low-carbon, and language-agnostic pipeline, the approach can be easily adapted to other health surveillance contexts, contributing to the broader, ethical, and explainable use of NLP in public health systems.