🤖 AI Summary
Existing databases on firearm violence are incomplete, and field investigations pose significant safety risks—hindering human rights organizations’ data collection and policy responsiveness. This study introduces a lightweight, domain-adapted BERT model for Brazilian Portuguese, the first to integrate fine-tuned language models directly into human rights practitioners’ operational workflows for automated detection of firearm violence incidents in Twitter posts and human-in-the-loop factual verification. The model achieves an AUC of 0.97, balancing high accuracy with field deployability. Validated via a web application and mixed-method evaluation—including quantitative interaction analysis and qualitative user interviews—the system demonstrates measurable impact: analysts’ retrieval efficiency improves, geographic coverage expands, and online reporting users exhibit significantly increased engagement frequency. Results confirm the practical feasibility and scalability of NLP for human rights data collection in high-risk, resource-constrained settings.
📝 Abstract
Gun violence is a pressing and growing human rights issue that affects nearly every dimension of the social fabric, from healthcare and education to psychology and the economy. Reliable data on firearm events is paramount to developing more effective public policy and emergency responses. However, the lack of comprehensive databases and the risks of in-person surveys prevent human rights organizations from collecting needed data in most countries. Here, we partner with a Brazilian human rights organization to conduct a systematic evaluation of language models to assist with monitoring real-world firearm events from social media data. We propose a fine-tuned BERT-based model trained on Twitter (now X) texts to distinguish gun violence reports from ordinary Portuguese texts. Our model achieves a high AUC score of 0.97. We then incorporate our model into a web application and test it in a live intervention. We study and interview Brazilian analysts who continuously fact-check social media texts to identify new gun violence events. Qualitative assessments show that our solution helped all analysts use their time more efficiently and expanded their search capacities. Quantitative assessments show that the use of our model was associated with more analysts' interactions with online users reporting gun violence. Taken together, our findings suggest that modern Natural Language Processing techniques can help support the work of human rights organizations.