Beyond the Final Layer: Intermediate Representations for Better Multilingual Calibration in Large Language Models

📅 2025-10-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a pervasive confidence miscalibration bias in large language models (LLMs) under multilingual settings—particularly severe for non-English languages—stemming from English-dominant training biases embedded in the final output layer. To address this, the authors first observe that late intermediate layers exhibit superior multilingual calibration capability compared to the output layer. Building on this insight, they propose LACE (Language-Aware Calibration via Ensemble), a fine-tuning-free, language-aware layer ensemble method that leverages inter-layer representation analysis, dynamic weight assignment, and language-adaptive integration to improve cross-lingual confidence calibration. Extensive experiments across six model families and over 100 languages demonstrate that LACE significantly enhances calibration for non-English languages, reducing Expected Calibration Error (ECE) by an average of 32%. This work establishes a new paradigm for developing fair, trustworthy, and globally applicable LLMs.

Technology Category

Application Category

📝 Abstract
Confidence calibration, the alignment of a model's predicted confidence with its actual accuracy, is crucial for the reliable deployment of Large Language Models (LLMs). However, this critical property remains largely under-explored in multilingual contexts. In this work, we conduct the first large-scale, systematic studies of multilingual calibration across six model families and over 100 languages, revealing that non-English languages suffer from systematically worse calibration. To diagnose this, we investigate the model's internal representations and find that the final layer, biased by English-centric training, provides a poor signal for multilingual confidence. In contrast, our layer-wise analysis uncovers a key insight that late-intermediate layers consistently offer a more reliable and better-calibrated signal. Building on this, we introduce a suite of training-free methods, including Language-Aware Confidence Ensemble (LACE), which adaptively selects an optimal ensemble of layers for each specific language. Our study highlights the hidden costs of English-centric alignment and offer a new path toward building more globally equitable and trustworthy LLMs by looking beyond the final layer.
Problem

Research questions and friction points this paper is trying to address.

Investigating multilingual calibration issues in large language models
Addressing English-centric bias in final layer representations
Developing training-free methods for better multilingual confidence calibration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses late-intermediate layers for calibration
Introduces training-free Language-Aware Confidence Ensemble
Adaptively selects optimal layers per language