🤖 AI Summary
This work proposes an end-to-end framework to assist users in evaluating the credibility of online news. The system first leverages a large language model (LLM) to generate diverse, critical questions about a given news claim. It then retrieves relevant evidence from large-scale corpora such as MS MARCO V2.1 through semantic filtering, clustering, and a novel Chain-of-Thought query expansion strategy. Subsequently, retrieved passages are re-ranked using monoT5 and further assessed for relevance via LLM-driven judgment, culminating in a citation-backed credibility report. Experimental results demonstrate that the proposed Chain-of-Thought query expansion and re-ranking mechanisms significantly enhance both evidence relevance and domain-specific credibility, validating the efficacy of the approach, while also indicating room for improvement in the quality of generated questions.
📝 Abstract
The DRAGUN Track at TREC 2025 targets the growing need for effective support tools that help users evaluate the trustworthiness of online news. We describe the UR_Trecking system submitted for both Task 1 (critical question generation) and Task 2 (retrieval-augmented trustworthiness reporting). Our approach combines LLM-based question generation with semantic filtering, diversity enforcement using clustering, and several query expansion strategies (including reasoning-based Chain-of-Thought expansion) to retrieve relevant evidence from the MS MARCO V2.1 segmented corpus. Retrieved documents are re-ranked using a monoT5 model and filtered using an LLM relevance judge together with a domain-level trustworthiness dataset. For Task 2, selected evidence is synthesized by an LLM into concise trustworthiness reports with citations. Results from the official evaluation indicate that Chain-of-Thought query expansion and re-ranking substantially improve both relevance and domain trust compared to baseline retrieval, while question-generation performance shows moderate quality with room for improvement. We conclude by outlining key challenges encountered and suggesting directions for enhancing robustness and trustworthiness assessment in future iterations of the system.