Controlling Distributional Bias in Multi-Round LLM Generation via KL-Optimized Fine-Tuning

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the challenge that existing large language models struggle to dynamically control the distribution of output attributes—such as gender, race, or sentiment—across multi-turn generation to align with real-world or target statistical distributions. The authors propose a novel fine-tuning framework that integrates guided token calibration with semantic alignment, introducing distribution alignment as a core evaluation dimension for multi-turn generation. By constraining the probability mass of guided tokens via KL divergence and incorporating an optimization mechanism inspired by Kahneman-Tversky behavioral economics principles, the method tightly couples distributional control with semantically coherent responses. Evaluated on six diverse datasets, the approach significantly outperforms baseline methods, achieving high-precision distribution control in occupation-related attribute generation and effectively overcoming the limitations of conventional alignment techniques in distributional regulation.

Technology Category

Application Category

📝 Abstract

While the real world is inherently stochastic, Large Language Models (LLMs) are predominantly evaluated on single-round inference against fixed ground truths. In this work, we shift the lens to distribution alignment: assessing whether LLMs, when prompted repeatedly, can generate outputs that adhere to a desired target distribution, e.g. reflecting real-world statistics or a uniform distribution. We formulate distribution alignment using the attributes of gender, race, and sentiment within occupational contexts. Our empirical analysis reveals that off-the-shelf LLMs and standard alignment techniques, including prompt engineering and Direct Preference Optimization, fail to reliably control output distributions. To bridge this gap, we propose a novel fine-tuning framework that couples Steering Token Calibration with Semantic Alignment. We introduce a hybrid objective function combining Kullback-Leibler divergence to anchor the probability mass of latent steering tokens and Kahneman-Tversky Optimization to bind these tokens to semantically consistent responses. Experiments across six diverse datasets demonstrate that our approach significantly outperforms baselines, achieving precise distributional control in attribute generation tasks.

Problem

Research questions and friction points this paper is trying to address.

distributional bias

multi-round generation

large language models

distribution alignment

attribute control

Innovation

Methods, ideas, or system contributions that make the work stand out.

distribution alignment

KL-optimized fine-tuning

steering token calibration