🤖 AI Summary
This study addresses cross-lingual identification of news articles reporting the same critical event (e.g., natural disasters, terrorist attacks) to enable global comparative news analysis. We propose FAME, an unsupervised framework introducing a lightweight “event fingerprint” matching paradigm—leveraging standardized temporal, geospatial, and categorical event attributes without training data, thereby supporting multilingual, large-scale, and real-time event–news alignment. FAME integrates NLP-based metadata matching, cross-lingual event normalization, and heterogeneous external databases (MediaCloud, EM-DAT, USGS, GTD). Evaluated on 470 real-world events in 2020, FAME retrieved 27,441 multilingual news articles. Empirical analysis revealed statistically significant correlations between news coverage intensity and fatality counts, host-country GDP, and bilateral trade volume. This work establishes a scalable, zero-shot foundational framework for cross-lingual, event-driven news research.
📝 Abstract
Comparative studies of news coverage are challenging to conduct because methods to identify news articles about the same event in different languages require expertise that is difficult to scale. We introduce an AI-powered method for identifying news articles based on an event fingerprint, which is a minimal set of metadata required to identify critical events. Our event coverage identification method, FINGERPRINT TO ARTICLE MATCHING FOR EVENTS (FAME), efficiently identifies news articles about critical world events, specifically terrorist attacks and several types of natural disasters. FAME does not require training data and is able to automatically and efficiently identify news articles that discuss an event given its fingerprint: time, location, and class (such as storm or flood). The method achieves state-of-the-art performance and scales to massive databases of tens of millions of news articles and hundreds of events happening globally. We use FAME to identify 27,441 articles that cover 470 natural disaster and terrorist attack events that happened in 2020. To this end, we use a massive database of news articles in three languages from MediaCloud, and three widely used, expert-curated databases of critical events: EM-DAT, USGS, and GTD. Our case study reveals patterns consistent with prior literature: coverage of disasters and terrorist attacks correlates to death counts, to the GDP of a country where the event occurs, and to trade volume between the reporting country and the country where the event occurred. We share our NLP annotations and cross-country media attention data to support the efforts of researchers and media monitoring organizations.