π€ AI Summary
This work addresses the challenge of hallucination detection in large language model (LLM) generations by balancing accuracy, efficiency, and cross-domain robustness. The authors propose the Question-Answer Orthogonal Decomposition (QAOD) framework, which introduces, for the first time, an orthogonal decomposition of answer representations along directions aligned with the input question. This decomposition isolates question-irrelevant components to suppress domain-specific variations, complemented by a neuron selection mechanism that emphasizes both diversity and discriminability. By jointly optimizing an in-domain probe and an orthogonal probe for cross-domain generalization, QAOD enables efficient white-box detection with only a single forward pass. Experiments demonstrate that the joint probe achieves state-of-the-art in-domain AUROC across all modelβdataset combinations, while the orthogonal probe outperforms the best white-box baseline by 21% in cross-domain settings, with computational overhead below 25% of the generation cost.
π Abstract
Hallucination detection in large language models (LLMs) requires balancing accu racy, efficiency, and robustness to distribution shift. Black-box consistency methods are effective but demand repeated inference; single-pass white-box probes are effi cient yet treat answer representations in isolation, often degrading sharply under domain shift. We propose QAOD (Question-Answer Orthogonal Decomposition), a single-pass framework that projects away the question-aligned direction from the answer representation to obtain a question-orthogonal component that suppresses domain-conditioned variation. To identify informative signals, QAOD further selects layers via diversity-penalized Fisher scoring and discriminative neurons via Fisher importance. To address both in-domain detection and cross-domain generalization, we design two complementary probing strategies: pairing the or thogonal component with question context yields a joint probe that maximizes in-domain discriminability, while using the orthogonal component alone preserves domain-agnostic factuality signals for robust transfer. QAOD's joint probe achieves the best in-domain AUROC across all evaluated model-dataset pairs, while the orthogonal-only probe delivers the strongest OOD transfer, surpassing the best white-box baseline by up to 21% on BioASQ at under 25% of generation cost.