FacLens: Transferable Probe for Foreseeing Non-Factuality in Large Language Models

📅 2024-06-08

📈 Citations: 0

✨ Influential: 0

career value

154K/year

🤖 AI Summary

To address the propensity of large language models (LLMs) to generate non-factual responses in factual question answering, this paper introduces the novel task of *Non-Factual Prediction* (NFP)—predicting the factual correctness of an answer *before* generation, enabling proactive risk identification rather than post-hoc detection. We observe consistent cross-model patterns in implicit question representations that correlate with non-factuality across diverse LLMs. Leveraging this insight, we propose FacLens: a lightweight, transferable probe model that constructs a supervised binary classifier over hidden-layer representations and employs cross-model representation alignment for generalization. Evaluated on Llama, Qwen, and ChatGLM, FacLens achieves an average 3.2% F1 improvement over baselines, with <1M parameters and <5ms inference latency—outperforming existing methods. It provides an efficient, model-agnostic, and proactive safeguard for LLM content safety.

Technology Category

Application Category

📝 Abstract

Despite advancements in large language models (LLMs), non-factual responses remain prevalent. Unlike extensive studies on post-hoc detection of such responses, this work studies non-factuality prediction (NFP), aiming to predict whether an LLM will generate a non-factual response to a question before the generation process. Previous efforts on NFP have demonstrated LLMs' awareness of their internal knowledge, but they still face challenges in efficiency and transferability. In this work, we propose a lightweight NFP model named Factuality Lens (FacLens), which effectively probes hidden representations of questions for the NFP task. Besides, we discover that hidden question representations sourced from different LLMs exhibit similar NFP patterns, which enables the transferability of FacLens across LLMs to reduce development costs. Extensive experiments highlight FacLens's superiority in both effectiveness and efficiency.

Problem

Research questions and friction points this paper is trying to address.

Predicting non-factual responses before generation

Improving efficiency and transferability in detection

Probing hidden representations across different LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight model probes hidden question representations

Transferable across different large language models

Predicts non-factuality before response generation

🔎 Similar Papers

No similar papers found.