🤖 AI Summary
Large language models (LLMs) exhibit poor generalization to unseen output constraints—such as “answer only with ‘yes’ or ‘no’” or “mention ‘abrakadabra’ exactly three times”—resulting in unreliable instruction following. Method: We propose RLVR, a reinforcement learning framework with verifiable rewards, built upon a constraint verification module that integrates structured training data and precise, executable reward signals. Contribution/Results: We introduce IFBench—the first benchmark comprising 58 verifiable, cross-domain, and diverse output constraints—revealing severe overfitting in current models. We release 29 manually annotated constraint examples, their corresponding validation functions, and RLVR training resources. Experiments demonstrate that RLVR significantly improves zero-shot constraint adherence accuracy on IFBench, establishing a new paradigm and technical pathway for reliable human–AI interaction.
📝 Abstract
A crucial factor for successful human and AI interaction is the ability of language models or chatbots to follow human instructions precisely. A common feature of instructions are output constraints like ``only answer with yes or no" or ``mention the word `abrakadabra' at least 3 times" that the user adds to craft a more useful answer. Even today's strongest models struggle with fulfilling such constraints. We find that most models strongly overfit on a small set of verifiable constraints from the benchmarks that test these abilities, a skill called precise instruction following, and are not able to generalize well to unseen output constraints. We introduce a new benchmark, IFBench, to evaluate precise instruction following generalization on 58 new, diverse, and challenging verifiable out-of-domain constraints. In addition, we perform an extensive analysis of how and on what data models can be trained to improve precise instruction following generalization. Specifically, we carefully design constraint verification modules and show that reinforcement learning with verifiable rewards (RLVR) significantly improves instruction following. In addition to IFBench, we release 29 additional new hand-annotated training constraints and verification functions, RLVR training prompts, and code.