Enhancing Complex Instruction Following for Large Language Models with Mixture-of-Contexts Fine-tuning

📅 2025-05-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) suffer from insufficient sub-context modeling when following complex instructions with multiple constraints, limiting the effectiveness of supervised fine-tuning (SFT). Method: This paper proposes the Mixture-of-Contexts (MoC) paradigm and a multi-input single-output (MISO) fine-tuning architecture. MoC explicitly decomposes sequential complex instructions into parallel sub-contexts; MISO employs a multi-input attention fusion mechanism within decoder-only LLMs to jointly model instruction-output alignment at the global level and capture fine-grained contributions of each sub-context—enabling, for the first time in SFT, explicit sub-context modeling. Results: Experiments on multiple complex instruction benchmarks demonstrate an average 12.3% improvement in instruction-following accuracy and a 23% reduction in training steps, confirming both enhanced performance and improved training efficiency.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) exhibit remarkable capabilities in handling natural language tasks; however, they may struggle to consistently follow complex instructions including those involve multiple constraints. Post-training LLMs using supervised fine-tuning (SFT) is a standard approach to improve their ability to follow instructions. In addressing complex instruction following, existing efforts primarily focus on data-driven methods that synthesize complex instruction-output pairs for SFT. However, insufficient attention allocated to crucial sub-contexts may reduce the effectiveness of SFT. In this work, we propose transforming sequentially structured input instruction into multiple parallel instructions containing subcontexts. To support processing this multi-input, we propose MISO (Multi-Input Single-Output), an extension to currently dominant decoder-only transformer-based LLMs. MISO introduces a mixture-of-contexts paradigm that jointly considers the overall instruction-output alignment and the influence of individual sub-contexts to enhance SFT effectiveness. We apply MISO fine-tuning to complex instructionfollowing datasets and evaluate it with standard LLM inference. Empirical results demonstrate the superiority of MISO as a fine-tuning method for LLMs, both in terms of effectiveness in complex instruction-following scenarios and its potential for training efficiency.
Problem

Research questions and friction points this paper is trying to address.

Improving LLMs' ability to follow complex multi-constraint instructions
Addressing insufficient sub-context attention in supervised fine-tuning
Enhancing instruction-output alignment through parallel sub-context processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms sequential instructions into parallel subcontexts
Introduces MISO for multi-input single-output processing
Enhances SFT with mixture-of-contexts paradigm
🔎 Similar Papers
No similar papers found.
Yuheng Lu
Yuheng Lu
Peking University
3D Computer Vision
Z
ZiMeng Bai
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
C
Caixia Yuan
School of Artificial Intelligence, Beijing University of Posts and Telecommunications
Huixing Jiang
Huixing Jiang
Meituan Group
NLP
X
Xiaojie Wang
School of Artificial Intelligence, Beijing University of Posts and Telecommunications