🤖 AI Summary
This work addresses the challenge that language models often embed implicit conceptual decisions when answering open-ended questions, leading to responses that lack contextual grounding and are difficult to interpret. To tackle this, the authors propose the “Conceptual Multiverse” system, which adapts the statistical notion of multiverse analysis to the domain of language model interpretability. The system explicitly models conceptual choices—such as question framing and value orientations—as an interactive space amenable to exploration and intervention. By integrating expert-level domain reasoning with formal verification, it ensures the unambiguousness and completeness of the underlying conceptual structures. Experiments in philosophy, AI alignment, and poetry demonstrate that users can construct cognitive maps of complex questions, substantially enhancing their understanding of problem framings, intentions, and aesthetic judgments.
📝 Abstract
When language models answer open-ended problems, they implicitly make hidden decisions that shape their outputs, leaving users with uncontextualized answers rather than a working map of the problem; drawing on multiverse analysis from statistics, we build and evaluate the conceptual multiverse, an interactive system that represents conceptual decisions such as how to frame a question or what to value as a space users can transparently inspect, intervenably change, and check against principled domain reasoning; for this structure to be worth navigating rather than misleading, it must be rigorous and checkable against domain reasoning norms, so we develop a general verification framework that enforces properties of good decision structures like unambiguity and completeness calibrated by expert-level reasoning; across three domains, the conceptual multiverse helped participants develop a working map of the problem, with philosophy students rewriting essays with sharper framings and reversed theses, alignment annotators moving from surface preferences to reasoning about user intent and harm, and poets identifying compositional patterns that clarified their taste.