🤖 AI Summary
This work addresses the challenge in multilingual retrieval-augmented generation (mRAG) where fixed retrieval spaces often lead to linguistic or cultural mismatches for queries sensitive to cultural context, thereby degrading answer accuracy. To mitigate this, the authors propose a culture-aware adaptive retrieval method that employs an agent-driven iterative loop to dynamically refine both the retrieval corpus and query formulation. The approach critically evaluates retrieved evidence based on dual criteria—cultural relevance and content relevance—and adaptively re-selects source documents or rewrites the query when evidence is insufficient. Notably, this is the first framework to integrate explicit cultural alignment mechanisms into the joint optimization process of mRAG. Evaluated on two culturally grounded question-answering benchmarks, the method substantially improves performance for low-resource languages, achieving gains of up to 3.58 percentage points in accuracy over the strongest baseline.
📝 Abstract
Multilingual retrieval-augmented generation (mRAG) is often implemented within a fixed retrieval space, typically via query or document translation or multilingual embedding vector representations. However, this approach may be inadequate for culturally grounded queries, in which retrieval-condition misalignment may occur. Even strong retrievers and generators may struggle to produce culturally relevant answers when sourcing evidence from inappropriate linguistic or regional contexts. To this end, we introduce CORAL (COntext-aware Retrieval with Agentic Loop, an adaptive retrieval methodology for mRAG that enables iterative refinement of both the retrieval space (corpora) and the retrieval probe (query) based on the quality of the evidence. The overall process includes: (1) selecting corpora, (2) retrieving documents, (3) critiquing evidence for relevance and cultural alignment, and (4) checking sufficiency. If the retrieved documents are insufficient to answer the query correctly, the system (5) reselects corpora and rewrites the query. Across two cultural QA benchmarks, CORAL achieves up to a 3.58%p accuracy improvement on low-resource languages relative to the strongest baselines.