🤖 AI Summary
This work addresses the degradation of generation diversity in large language models after fine-tuning, commonly caused by style collapse. To mitigate this issue, the authors propose Semantic Flow Regularization (SFR), a lightweight auxiliary objective that leverages conditional flow matching to supervise the backbone network in predicting continuous semantic embeddings of future text tokens. This approach enhances multimodal generative capacity while preserving textual coherence. Theoretically, the method generalizes multi-token prediction as a degenerate special case of SFR. Empirical evaluations demonstrate that SFR significantly improves diversity, style fidelity, and response quality on the Qwen3-32B industrial dialogue dataset, with additional experiments on LiveCodeBench-v5 and MBPP confirming its strong generalization capabilities.
📝 Abstract
When large language models are fine-tuned to generate persona- or tone-conditioned responses, their output diversity is severely limited--a failure we term Cross-Style Collapse. We trace this collapse to the cross-entropy objective, which under shared representations tends to suppress diverse continuations. We propose Semantic Flow Regularization (SFR), a lightweight auxiliary objective that supervises the backbone with continuous sentence-encoder embeddings of future segments via conditional flow matching. The stochastic flow source preserves multi-modality by construction; the flow-matching head is discarded at inference, adding zero deployment cost. On a large-scale industrial dialogue dataset (Qwen3-32B, 9 personas), SFR improves output diversity, style fidelity, and response quality over SFT. We further validate on the public LiveCodeBench-v5 (Qwen2.5-Coder-7B-Instruct), where SFR consistently improves pass@k, confirming generality beyond stylized dialogue. A controlled comparison on MBPP reveals Multi-Token Prediction to be a degenerate special case of SFR.