π€ AI Summary
This work addresses the challenge of assessing contextual sufficiency in question-answering systems, particularly for inferential questions requiring multi-hop reasoning. We propose a two-stage structured judgment framework: first, generating verifiable hypotheses about missing information; second, validating their truthfulness via semantic clustering and re-examination of source passages. The method integrates large language modelβdriven hypothesis generation, reasoning-chain-guided justification, and consensus-driven semantic verification, substantially improving detection of implicit information gaps. Evaluated on multiple multi-hop and factoid QA benchmarks, our approach outperforms existing baselines in contextual sufficiency classification accuracy. Moreover, it enables precise localization of critical information deficits within reasoning chains, enhancing interpretability and diagnostic capability.
π Abstract
Determining whether a provided context contains sufficient information to answer a question is a critical challenge for building reliable question-answering systems. While simple prompting strategies have shown success on factual questions, they frequently fail on inferential ones that require reasoning beyond direct text extraction. We hypothesize that asking a model to first reason about what specific information is missing provides a more reliable, implicit signal for assessing overall sufficiency. To this end, we propose a structured Identify-then-Verify framework for robust sufficiency modeling. Our method first generates multiple hypotheses about missing information and establishes a semantic consensus. It then performs a critical verification step, forcing the model to re-examine the source text to confirm whether this information is truly absent. We evaluate our method against established baselines across diverse multi-hop and factual QA datasets. The results demonstrate that by guiding the model to justify its claims about missing information, our framework produces more accurate sufficiency judgments while clearly articulating any information gaps.