Confidence Calibration in Large Language Model-Based Entity Matching

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

142K/year

🤖 AI Summary

This work addresses the prevalent overconfidence problem in large language models—particularly RoBERTa—when applied to entity matching tasks. We systematically investigate confidence calibration techniques and propose a multi-strategy calibration framework integrating temperature scaling, Monte Carlo Dropout, and model ensembling. Empirical evaluation is conducted across multiple standard entity matching benchmarks. Results demonstrate that the baseline RoBERTa exhibits significant miscalibration; temperature scaling emerges as the most effective single-method intervention, reducing the Expected Calibration Error (ECE) by up to 23.83%; the full calibrated framework achieves a minimum ECE of 0.0043, substantially improving predictive reliability and decision trustworthiness. This study establishes a reproducible calibration paradigm and empirical benchmark for deploying LLMs reliably in high-stakes downstream applications.

Technology Category

Application Category

📝 Abstract

This research aims to explore the intersection of Large Language Models and confidence calibration in Entity Matching. To this end, we perform an empirical study to compare baseline RoBERTa confidences for an Entity Matching task against confidences that are calibrated using Temperature Scaling, Monte Carlo Dropout and Ensembles. We use the Abt-Buy, DBLP-ACM, iTunes-Amazon and Company datasets. The findings indicate that the proposed modified RoBERTa model exhibits a slight overconfidence, with Expected Calibration Error scores ranging from 0.0043 to 0.0552 across datasets. We find that this overconfidence can be mitigated using Temperature Scaling, reducing Expected Calibration Error scores by up to 23.83%.

Problem

Research questions and friction points this paper is trying to address.

Calibrating confidence scores for entity matching using large language models

Addressing overconfidence issues in RoBERTa model predictions for entity matching

Improving confidence calibration accuracy through temperature scaling techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Calibrating RoBERTa confidences with Temperature Scaling

Comparing Monte Carlo Dropout and Ensemble calibration methods

Reducing Expected Calibration Error by up to 23.83%

🔎 Similar Papers

No similar papers found.