RuleR: Improving LLM Controllability by Rule-based Data Recycling

📅 2024-06-22
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) still exhibit limited controllability in response generation, while existing supervised fine-tuning (SFT) approaches rely on costly human annotation or proprietary model distillation. This paper proposes a rule-driven data reuse paradigm: multi-constraint conditions are injected into original instructions via rule-based template engineering, and responses are structurally edited to yield high-controllability training samples—entirely at zero annotation cost. The method requires no human labeling, black-box model distillation, or additional data collection, enabling plug-and-play fine-tuning. It achieves significant gains across multiple controllability benchmarks while preserving baseline instruction-following capability; training overhead remains comparable to standard SFT. The core contribution is the first lightweight, interpretable, and composable framework for rule-based data construction—enabling precise, transparent, and modular control over LLM behavior without external dependencies.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs. Instead of creating new data from scratch, RuleR"recycles"existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions. Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLM response controllability
Reducing data curation costs
Recycling existing data with rules
Innovation

Methods, ideas, or system contributions that make the work stand out.

Rule-based Data Recycling method
Enhances LLM controllability
Recycles existing data with rules
🔎 Similar Papers
No similar papers found.
M
Ming Li
University of Maryland
H
Han Chen
C
Chenguang Wang
Stony Brook University
D
Dang Nguyen
University of Maryland
Dianqi Li
Dianqi Li
University of Washington
Deep LearningNatural Language Processing
T
Tianyi Zhou
University of Maryland