Towards Geo-Culturally Grounded LLM Generations

📅 2025-02-19

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This study addresses the systematic deficiency of generative large language models (LLMs) in representing global multicultural knowledge. We propose and empirically evaluate two retrieval-augmented generation (RAG) strategies: knowledge-base grounding (using curated, domain-specific cultural resources) and real-time web search grounding. For the first time, we conceptually distinguish *cultural propositional knowledge* (e.g., institutional or normative recognition) from *open-ended cultural fluency* (e.g., contextual appropriateness, non-stereotypical expression), and construct a multidimensional cultural familiarity benchmark comprising multiple-choice tasks and human evaluations. Results show that search grounding significantly improves accuracy on propositional knowledge tasks but exacerbates stereotypical reasoning and fails to enhance human-rated cultural familiarity; KB grounding is constrained by coverage breadth and retrieval precision. Our work exposes a critical gap between closed-book evaluation metrics and authentic cultural understanding, offering both a conceptual framework and methodological cautions for advancing LLM cultural adaptability research.

Technology Category

Application Category

📝 Abstract

Generative large language models (LLMs) have been demonstrated to have gaps in diverse, cultural knowledge across the globe. We investigate the effect of retrieval augmented generation and search-grounding techniques on the ability of LLMs to display familiarity with a diverse range of national cultures. Specifically, we compare the performance of standard LLMs, LLMs augmented with retrievals from a bespoke knowledge base (i.e., KB grounding), and LLMs augmented with retrievals from a web search (i.e., search grounding) on a series of cultural familiarity benchmarks. We find that search grounding significantly improves the LLM performance on multiple-choice benchmarks that test propositional knowledge (e.g., the norms, artifacts, and institutions of national cultures), while KB grounding's effectiveness is limited by inadequate knowledge base coverage and a suboptimal retriever. However, search grounding also increases the risk of stereotypical judgments by language models, while failing to improve evaluators' judgments of cultural familiarity in a human evaluation with adequate statistical power. These results highlight the distinction between propositional knowledge about a culture and open-ended cultural fluency when it comes to evaluating the cultural familiarity of generative LLMs.

Problem

Research questions and friction points this paper is trying to address.

Addressing cultural knowledge gaps in LLMs

Comparing retrieval augmented generation techniques

Evaluating cultural familiarity in generative LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval augmented generation

Search-grounding techniques

Cultural familiarity benchmarks

🔎 Similar Papers

No similar papers found.