π€ AI Summary
This work addresses the fairness challenges in large language models (LLMs) that arise when user input is insufficient: such models rely on distributional knowledge to infer preferences, often amplifying majority viewpoints while marginalizing minority perspectives. Although frequent active questioning can enhance personalization, it imposes excessive cognitive burden on users. The paper presents the first formal model of the trade-off between information inference and active questioning in generative AI, integrating game theory and information economics to develop an interactive theoretical framework. It proposes an optimal questioning strategy based on preference correlation and theoretically demonstrates that moderate questioning significantly reduces inference bias. Empirical experiments corroborate the modelβs predictions, offering mechanism design principles for building generative AI systems that balance efficiency with inclusivity.
π Abstract
Generative AI models differ from traditional machine learning tools in that they allow users to provide as much or as little information as they choose in their inputs. This flexibility often leads users to omit certain details, relying on the models to infer and fill in under-specified information based on distributional knowledge of user preferences. Such inferences may privilege majority viewpoints and disadvantage users with atypical preferences, raising concerns about fairness. Unlike more traditional recommender systems, LLMs can explicitly solicit more information from users through natural language. However, while directly eliciting user preferences could increase personalization and mitigate inequality, excessive querying places a burden on users who value efficiency. We develop a stylized model of user-LLM interaction and develop an objective that captures tradeoff between user burden and preference representation. Building on the observation that individual preferences are often correlated, we analyze how AI systems should balance inference and elicitation, characterizing the optimal amount of information to solicit before content generation. Ultimately, we show that information elicitation can mitigate the systematic biases of preference inference, enabling the design of generative tools that better incorporate diverse user perspectives while maintaining efficiency. We complement this theoretical analysis with an empirical evaluation illustrating the model's predictions and exploring their practical implications.