🤖 AI Summary
This paper identifies a geographic fairness deficit in AI-powered information search assistants: responses to queries related to high-income countries exhibit significantly higher “helpfulness” (+12.7%), revealing structural biases in global knowledge representation. Method: We introduce “helpfulness” as a novel, dialogue-system–specific fairness metric; construct a geographically aware helpfulness evaluation dataset via crowdsourced human annotation; train BERT/DeBERTa models for automated assessment; and conduct cross-regional thematic experiments and statistical bias analysis on mainstream systems (e.g., ChatGPT). Contribution/Results: We propose an actionable fairness diagnostic framework that enables identification, quantification, and mitigation of implicit geographic bias in AI assistants—providing both methodological foundations and empirical evidence for equitable AI development.
📝 Abstract
Information-seeking AI assistant systems aim to answer users' queries about knowledge in a timely manner. However, both the human-perceived helpfulness of information-seeking assistant systems and its fairness implication are under-explored. In this paper, we study computational measurements of helpfulness. We collect human annotations on the helpfulness of dialogue responses, develop models for automatic helpfulness evaluation, and then propose to use the helpfulness level of a dialogue system towards different user queries to gauge the fairness of a dialogue system. Experiments with state-of-the-art dialogue systems, including ChatGPT, under three information-seeking scenarios reveal that existing systems tend to be more helpful for questions regarding concepts from highly-developed countries than less-developed countries, uncovering potential fairness concerns underlying the current information-seeking assistant systems.