CroSearch-R1: Better Leveraging Cross-lingual Knowledge for Retrieval-Augmented Generation

📅 2026-04-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

163K/year
🤖 AI Summary
This work addresses the challenge of ineffective cross-lingual knowledge integration in multilingual retrieval-augmented generation (RAG), which often stems from linguistic disparities. To overcome this limitation, the authors propose CroSearch-R1, a novel framework that leverages iterative cross-lingual retrieval and dynamic alignment within a unified representation space. By incorporating a multilingual rollout mechanism under the group relative policy optimization (GRPO) reinforcement learning paradigm, CroSearch-R1 enables efficient fusion of cross-lingual knowledge and facilitates reasoning transfer across languages. Experimental results demonstrate that the proposed approach substantially improves factual accuracy and generation quality in multilingual RAG systems, effectively harnessing complementary information embedded across different languages.
📝 Abstract
A multilingual collection may contain useful knowledge in other languages to supplement and correct the facts in the original language for Retrieval-Augmented Generation (RAG). However, the vanilla approach that simply concatenates multiple pieces of knowledge from different languages into the context may fail to improve effectiveness due to the potential disparities across languages. To better leverage multilingual knowledge, we propose CroSearch-R1, a search-augmented reinforcement learning framework to integrate multilingual knowledge into the Group Relative Policy Optimization (GRPO) process. In particular, the approach adopts a multi-turn retrieval strategy with cross-lingual knowledge integration to dynamically align the knowledge from other languages as supplementary evidence into a unified representation space. Furthermore, we introduce a multilingual rollout mechanism to optimize reasoning transferability across languages. Experimental results demonstrate that our framework effectively leverages cross-lingual complementarity and improves the effectiveness of RAG with multilingual collections.
Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation
Cross-lingual Knowledge
Multilingual Collections
Knowledge Integration
Language Disparities
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual Knowledge Integration
Retrieval-Augmented Generation
Reinforcement Learning
Multilingual Rollout
Multi-turn Retrieval
🔎 Similar Papers
No similar papers found.