TruthTorchLM: A Comprehensive Library for Predicting Truthfulness in LLM Outputs

📅 2025-07-10

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

In high-stakes applications, the absence of unified, scalable evaluation tools hinders reliable prediction of large language model (LLM) output veracity. To address this, we introduce and open-source a comprehensive Python library—the first to systematically integrate over 30 veracity prediction methods across multiple dimensions: black-box vs. white-box, self-supervised vs. supervised, and reference-document–dependent vs.–independent paradigms. The library supports both Hugging Face and LiteLLM ecosystems, accommodates local and API-hosted LLMs, and provides end-to-end functionality for generation, evaluation, calibration, and long-context veracity prediction. Empirical validation on TriviaQA, GSM8K, and FactScore-Bio demonstrates consistent improvements in prediction accuracy and robustness. Our work significantly enhances reproducibility and usability, bridging a critical gap between methodological diversity and engineering practice in LLM veracity prediction.

Technology Category

Application Category

📝 Abstract

Generative Large Language Models (LLMs)inevitably produce untruthful responses. Accurately predicting the truthfulness of these outputs is critical, especially in high-stakes settings. To accelerate research in this domain and make truthfulness prediction methods more accessible, we introduce TruthTorchLM an open-source, comprehensive Python library featuring over 30 truthfulness prediction methods, which we refer to as Truth Methods. Unlike existing toolkits such as Guardrails, which focus solely on document-grounded verification, or LM-Polygraph, which is limited to uncertainty-based methods, TruthTorchLM offers a broad and extensible collection of techniques. These methods span diverse tradeoffs in computational cost, access level (e.g., black-box vs white-box), grounding document requirements, and supervision type (self-supervised or supervised). TruthTorchLM is seamlessly compatible with both HuggingFace and LiteLLM, enabling support for locally hosted and API-based models. It also provides a unified interface for generation, evaluation, calibration, and long-form truthfulness prediction, along with a flexible framework for extending the library with new methods. We conduct an evaluation of representative truth methods on three datasets, TriviaQA, GSM8K, and FactScore-Bio. The code is available at https://github.com/Ybakman/TruthTorchLM

Problem

Research questions and friction points this paper is trying to address.

Predicting truthfulness in LLM outputs accurately

Providing diverse truthfulness prediction methods in one library

Supporting both local and API-based LLM evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-source Python library with 30+ truthfulness methods

Supports HuggingFace and LiteLLM for model compatibility

Unified interface for generation, evaluation, and calibration

🔎 Similar Papers

Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories