๐ค AI Summary
This study investigates the root causes of multi-turn interaction necessity in LLM-based question answering, focusing on two dynamic problem attributes: incompleteness and ambiguity. We propose the first neuro-symbolic framework to automatically infer these attributes from interaction logs and establish their causal relationships with required interaction turns and answer correctness. Our key contributions are threefold: (1) We formally define incompleteness and ambiguity as computable, evolution-aware, interaction-driven propertiesโnot static question features; (2) Using a controllably constructed benchmark dataset and human-annotated experiments, we empirically validate that high incompleteness or ambiguity significantly increases turn requirements, while effective interaction systematically reduces both attributes; (3) Our metrics accurately characterize and predict LLM QA performance, providing both theoretical grounding and practical tools for interpretable human-AI collaboration.
๐ Abstract
Natural language as a medium for human-computer interaction has long been anticipated, has been undergoing a sea-change with the advent of Large Language Models (LLMs) with startling capacities for processing and generating language. Many of us now treat LLMs as modern-day oracles, asking it almost any kind of question. Unlike its Delphic predecessor, consulting an LLM does not have to be a single-turn activity (ask a question, receive an answer, leave); and -- also unlike the Pythia -- it is widely acknowledged that answers from LLMs can be improved with additional context. In this paper, we aim to study when we need multi-turn interactions with LLMs to successfully get a question answered; or conclude that a question is unanswerable. We present a neural symbolic framework that models the interactions between human and LLM agents. Through the proposed framework, we define incompleteness and ambiguity in the questions as properties deducible from the messages exchanged in the interaction, and provide results from benchmark problems, in which the answer-correctness is shown to depend on whether or not questions demonstrate the presence of incompleteness or ambiguity (according to the properties we identify). Our results show multi-turn interactions are usually required for datasets which have a high proportion of incompleteness or ambiguous questions; and that that increasing interaction length has the effect of reducing incompleteness or ambiguity. The results also suggest that our measures of incompleteness and ambiguity can be useful tools for characterising interactions with an LLM on question-answeringproblems