🤖 AI Summary
This work addresses the challenge that existing large language models struggle to dynamically control the distribution of output attributes—such as gender, race, or sentiment—across multi-turn generation to align with real-world or target statistical distributions. The authors propose a novel fine-tuning framework that integrates guided token calibration with semantic alignment, introducing distribution alignment as a core evaluation dimension for multi-turn generation. By constraining the probability mass of guided tokens via KL divergence and incorporating an optimization mechanism inspired by Kahneman-Tversky behavioral economics principles, the method tightly couples distributional control with semantically coherent responses. Evaluated on six diverse datasets, the approach significantly outperforms baseline methods, achieving high-precision distribution control in occupation-related attribute generation and effectively overcoming the limitations of conventional alignment techniques in distributional regulation.
📝 Abstract
While the real world is inherently stochastic, Large Language Models (LLMs) are predominantly evaluated on single-round inference against fixed ground truths. In this work, we shift the lens to distribution alignment: assessing whether LLMs, when prompted repeatedly, can generate outputs that adhere to a desired target distribution, e.g. reflecting real-world statistics or a uniform distribution. We formulate distribution alignment using the attributes of gender, race, and sentiment within occupational contexts. Our empirical analysis reveals that off-the-shelf LLMs and standard alignment techniques, including prompt engineering and Direct Preference Optimization, fail to reliably control output distributions. To bridge this gap, we propose a novel fine-tuning framework that couples Steering Token Calibration with Semantic Alignment. We introduce a hybrid objective function combining Kullback-Leibler divergence to anchor the probability mass of latent steering tokens and Kahneman-Tversky Optimization to bind these tokens to semantically consistent responses. Experiments across six diverse datasets demonstrate that our approach significantly outperforms baselines, achieving precise distributional control in attribute generation tasks.