Reducing Hallucination in Enterprise AI Workflows via Hybrid Utility Minimum Bayes Risk (HUMBR)

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

190K/year
🤖 AI Summary
This work addresses the critical challenge of hallucination in large language models deployed in high-stakes enterprise settings such as legal reasoning and risk management, where erroneous outputs can have severe consequences. The study formulates hallucination mitigation as a minimum Bayes risk (MBR) optimization problem and introduces a novel Hybrid Utility MBR (HUMBR) framework. HUMBR enables self-consistency optimization without ground-truth references by jointly leveraging semantic embedding similarity and lexical precision to assess output reliability—all without requiring human-annotated labels—and provides rigorous theoretical error bounds. Evaluated on TruthfulQA, LegalBench, and Meta’s production data, HUMBR substantially outperforms existing self-consistency methods, with 81% of its outputs surpassing human reference answers in quality and nearly eliminating critical recall failures.

Technology Category

Application Category

📝 Abstract
Although LLMs drive automation, it is critical to ensure immense consideration for high-stakes enterprise workflows such as those involving legal matters, risk management, and privacy compliance. For Meta, and other organizations like ours, a single hallucinated clause in such high stakes workflows risks material consequences. We show that by framing hallucination mitigation as a Minimum Bayes Risk (MBR) problem, we can dramatically reduce this risk. Specifically, we introduce a Hybrid Utility MBR (HUMBR) framework that synthesizes semantic embedding similarity with lexical precision to identify consensus without ground-truth references, for which we derive rigorous error bounds. We complement this theoretical analysis with a comprehensive empirical evaluation on widely-used public benchmark suites (TruthfulQA and LegalBench) and also real world data from Meta production deployment. The results from our empirical study show that MBR significantly outperforms standard Universal Self-Consistency. Notably, 81% of the pipeline's suggestions were preferred over human-crafted ground truth, and critical recall failures were virtually eliminated.
Problem

Research questions and friction points this paper is trying to address.

Hallucination
Enterprise AI
Minimum Bayes Risk
High-stakes Workflows
LLM Reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Utility MBR
hallucination mitigation
semantic embedding
lexical precision
Minimum Bayes Risk
C
Chenhao Fang
Meta Platforms, Inc.
J
Jordi Mola
Meta Platforms, Inc.
Mark Harman
Mark Harman
Research Scientist at Meta & Professor of Software Engineering at UCL
SBSESoftware TestingEvolutionary ComputationProgram AnalysisSoftware Engineering
J
Jason Nawrocki
Meta Platforms, Inc.
V
Vaibhav Shrivastava
Meta Platforms, Inc.
Y
Yue Cheng
Meta Platforms, Inc.
J
Jay Minesh Shah
Meta Platforms, Inc.
K
Katayoun Zand
Meta Platforms, Inc.
M
Mansi Tripathi
Meta Platforms, Inc.
A
Arya Pudota
Meta Platforms, Inc.
M
Matthew Becker
Meta Platforms, Inc.
H
Hervé Robert
Meta Platforms, Inc.
A
Abhishek Gulati
Meta Platforms, Inc.