Teaching LLMs to Abstain across Languages via Multilingual Feedback

📅 2024-06-22

🏛️ Conference on Empirical Methods in Natural Language Processing

📈 Citations: 2

✨ Influential: 0

career value

157K/year

🤖 AI Summary

To address hallucination in multilingual large language models (LLMs) stemming from knowledge imbalance in low-resource languages, this paper proposes a multilingual reflective feedback mechanism: leveraging semantically related languages to generate cross-lingual and cross-cultural feedback, thereby enabling models to identify knowledge gaps and proactively abstain from answering. We first uncover the influence of cultural factors on abstention behavior and language selection. We further design a lightweight, model-agnostic adaptation framework compatible with both black-box and open-source LLMs, and construct a comprehensive multilingual evaluation suite covering open-book, closed-book, and commonsense question answering. Experiments demonstrate that our method improves abstention accuracy for low-resource languages by up to 9.2% across the three QA tasks, reduces the performance gap between high- and low-resource languages by 20.5%, and significantly enhances model calibration and service fairness.

Technology Category

Application Category

📝 Abstract

Multilingual LLMs often have knowledge disparities across languages, with larger gaps in under-resourced languages. Teaching LLMs to abstain in the face of knowledge gaps is thus a promising strategy to mitigate hallucinations in multilingual settings. However, previous studies on LLM abstention primarily focus on English; we find that directly applying existing solutions beyond English results in up to 20.5% performance gaps between high and low-resource languages, potentially due to LLMs’ drop in calibration and reasoning beyond a few resource-rich languages. To this end, we propose strategies to enhance LLM abstention by learning from multilingual feedback, where LLMs self-reflect on proposed answers in one language by generating multiple feedback items in related languages: we show that this helps identifying the knowledge gaps across diverse languages, cultures, and communities. Extensive experiments demonstrate that our multilingual feedback approach outperforms various strong baselines, achieving up to 9.2% improvement for low-resource languages across three black-box and open models on three datasets, featuring open-book, closed-book, and commonsense QA. Further analysis reveals that multilingual feedback is both an effective and a more equitable abstain strategy to serve diverse language speakers, and cultural factors have great impact on language selection and LLM abstention behavior, highlighting future directions for multilingual and multi-cultural reliable language modeling.

Problem

Research questions and friction points this paper is trying to address.

Address knowledge disparities in multilingual LLMs.

Improve LLM abstention using multilingual feedback.

Enhance performance for low-resource languages.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual feedback enhances LLM abstention

Self-reflection across languages identifies gaps

Improves performance in low-resource languages

🔎 Similar Papers

Know Your Limits: A Survey of Abstention in Large Language Models