FacLens: Transferable Probe for Foreseeing Non-Factuality in Large Language Models

📅 2024-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the propensity of large language models (LLMs) to generate non-factual responses in factual question answering, this paper introduces the novel task of *Non-Factual Prediction* (NFP)—predicting the factual correctness of an answer *before* generation, enabling proactive risk identification rather than post-hoc detection. We observe consistent cross-model patterns in implicit question representations that correlate with non-factuality across diverse LLMs. Leveraging this insight, we propose FacLens: a lightweight, transferable probe model that constructs a supervised binary classifier over hidden-layer representations and employs cross-model representation alignment for generalization. Evaluated on Llama, Qwen, and ChatGLM, FacLens achieves an average 3.2% F1 improvement over baselines, with <1M parameters and <5ms inference latency—outperforming existing methods. It provides an efficient, model-agnostic, and proactive safeguard for LLM content safety.

Technology Category

Application Category

📝 Abstract
Despite advancements in large language models (LLMs), non-factual responses remain prevalent. Unlike extensive studies on post-hoc detection of such responses, this work studies non-factuality prediction (NFP), aiming to predict whether an LLM will generate a non-factual response to a question before the generation process. Previous efforts on NFP have demonstrated LLMs' awareness of their internal knowledge, but they still face challenges in efficiency and transferability. In this work, we propose a lightweight NFP model named Factuality Lens (FacLens), which effectively probes hidden representations of questions for the NFP task. Besides, we discover that hidden question representations sourced from different LLMs exhibit similar NFP patterns, which enables the transferability of FacLens across LLMs to reduce development costs. Extensive experiments highlight FacLens's superiority in both effectiveness and efficiency.
Problem

Research questions and friction points this paper is trying to address.

Predicting non-factual responses before generation
Improving efficiency and transferability in detection
Probing hidden representations across different LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight model probes hidden question representations
Transferable across different large language models
Predicts non-factuality before response generation
🔎 Similar Papers
No similar papers found.
Yanling Wang
Yanling Wang
Zhipu AI
Data MiningNatural Language Processing
H
Haoyang Li
Renmin University of China
H
Hao Zou
Zhongguancun Laboratory
J
Jing Zhang
Renmin University of China
Xinlei He
Xinlei He
Assistant Professor, HKUST(GZ)
Trustworthy Machine LearningSecurityPrivacy
Q
Qi Li
Tsinghua University
K
Ke Xu
Tsinghua University