🤖 AI Summary
This work addresses the limited capability of large language models (LLMs) to actively and adaptively collect information across multi-turn interactions. We propose a Bayesian experimental design framework that explicitly models and updates the LLM’s internal belief state as a probability distribution. Our approach introduces: (1) a probabilistic representation of the LLM’s latent beliefs; (2) an efficient, context-free estimator for expected information gain (EIG); and (3) a sequential experimental design mechanism integrated with targeted candidate query generation for dynamic, conditionally optimized query selection. Unlike heuristic or prompt-engineering–based methods, our framework enables principled, uncertainty-aware information acquisition. Evaluated on 20-Question games and user preference inference tasks, it significantly outperforms direct prompting and existing adaptive baselines—achieving higher information collection efficiency and improved downstream task performance.
📝 Abstract
We propose a general-purpose approach for improving the ability of Large Language Models (LLMs) to intelligently and adaptively gather information from a user or other external source using the framework of sequential Bayesian experimental design (BED). This enables LLMs to act as effective multi-turn conversational agents and interactively interface with external environments. Our approach, which we call BED-LLM (Bayesian Experimental Design with Large Language Models), is based on iteratively choosing questions or queries that maximize the expected information gain (EIG) about the task of interest given the responses gathered previously. We show how this EIG can be formulated in a principled way using a probabilistic model derived from the LLM's belief distribution and provide detailed insights into key decisions in its construction. Further key to the success of BED-LLM are a number of specific innovations, such as a carefully designed estimator for the EIG, not solely relying on in-context updates for conditioning on previous responses, and a targeted strategy for proposing candidate queries. We find that BED-LLM achieves substantial gains in performance across a wide range of tests based on the 20-questions game and using the LLM to actively infer user preferences, compared to direct prompting of the LLM and other adaptive design strategies.