Rewrite the News: Tracing Editorial Reuse Across News Agencies

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the prevalent yet underexplored phenomenon of non-literal, cross-lingual editorial content reuse in multilingual news reporting, which traditional approaches struggle to detect due to their reliance on full-document translation—thereby exacerbating information overload for journalists. To overcome this limitation, the authors propose a weakly supervised method that detects sentence-level cross-lingual reuse without requiring complete translations and leverages publication timestamps to trace the earliest plausible source. By integrating cross-lingual sentence alignment, temporal ordering, and multilingual text analysis, the approach identifies reuse in 52% of 1,037 articles from the Slovenian Press Agency, uncovering 1,087 earliest-source sentence pairs. The findings reveal that reuse typically involves paraphrasing and multi-source synthesis, predominantly occurring in the latter portions of articles, substantially enhancing the efficiency of news provenance tracking.
📝 Abstract
This paper investigates sentence-level text reuse in multilingual journalism, analyzing where reused content occurs within articles. We present a weakly supervised method for detecting sentence-level cross-lingual reuse without requiring full translations, designed to support automated pre-selection to reduce information overload for journalists (Holyst et al., 2024). The study compares English-language articles from the Slovenian Press Agency (STA) with reports from 15 foreign agencies (FA) in seven languages, using publication timestamps to retain the earliest likely foreign source for each reused sentence. We analyze 1,037 STA and 237,551 FA articles from two time windows (October 7-November 2, 2023; February 1-28, 2025) and identify 1,087 aligned sentence pairs after filtering to the earliest sources. Reuse occurs in 52% of STA articles and 1.6% of FA articles and is predominantly non-literal, involving paraphrase and compositional reuse from multiple sources. Reused content tends to appear in the middle and end of English articles, while leads are more often original, indicating that simple lexical matching overlooks substantial editorial reuse. Compared with prior work focused on monolingual overlap, we (i) detect reuse across languages without requiring full translation, (ii) use publication timing to identify likely sources, and (iii) analyze where reused material is situated within articles. Dataset and code: https://github.com/kunturs/lrec2026-rewrite-news.
Problem

Research questions and friction points this paper is trying to address.

text reuse
cross-lingual journalism
sentence-level reuse
editorial reuse
multilingual news
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual text reuse
weakly supervised detection
publication timestamp
sentence-level alignment
editorial rewriting
🔎 Similar Papers
No similar papers found.
S
Soveatin Kuntur
Warsaw University of Technology, Poland
N
Nina Smirnova
GESIS – Leibniz Institute for the Social Sciences, Germany
A
Anna Wroblewska
Warsaw University of Technology, Poland
Philipp Mayr
Philipp Mayr
GESIS - Leibniz Institute for the Social Sciences
Interactive Information RetrievalInformetricsDigital librariesInformation SeekingDataset Search
S
Sebastijan Razboršek Maček
Slovenian Press Agency, Slovenia