🤖 AI Summary
This work addresses the challenge of retrieving scientific literature sources from multilingual social media short texts, where performance is often degraded by semantically similar distractors. To mitigate this issue, the authors propose a clustering-aware, staged hard negative mining approach that integrates dense retrieval, multilingual cross-encoder reranking, and large language model (LLM)-based evidence selection. By leveraging the semantic cluster structure of the candidate pool, the method distinguishes between local cluster negatives and global semantic negatives to construct stage-aware hard negatives. Additionally, constrained prompting strategies—such as constrained classification—are designed to enhance the reliability of LLM-based evidence selection. Evaluated on the CheckThat! 2026 shared task, the proposed method achieved 6th place out of 37 participating teams, demonstrating significant improvements in cross-lingual retrieval and reranking performance.
📝 Abstract
Identifying the scientific source behind a social media claim requires matching short, informal, and often multilingual claims against large collections of scientific publications, where semantically related papers may act as challenging distractors or false negatives during training. We present our submission to CheckThat! 2026 Task 1 on multilingual scientific-source retrieval, focusing on how hard-negative mining should be adapted to multi-stage retrieval pipelines for scientific-source retrieval. We propose cluster-aware hard-negative mining strategies that exploit the semantic structure of retrieved candidate pools in order to construct more informative training negatives for dense retrieval and reranking. Our experiments show that different hard-negative structures induce different retrieval behaviors. Localized cluster negatives tend to favor precision-oriented retrieval, whereas broader non-gold semantic negatives provide stronger candidate coverage and more consistent reranking performance across languages. We further study multiple LLM-based evidence-selection formulations, including direct classification, pairwise comparison, and listwise reranking prompts, and find that constrained classification prompts provide the most reliable final document selection. The final system combines a dense retriever, a multilingual cross-encoder reranker, and a selective LLM-based disagreement resolver, ranking 6th among 37 submissions in the shared task evaluation. Overall, our results suggest that hard-negative mining should be treated as a stage-aware design problem rather than as a single retrieval optimization strategy.