Generalizing Verifiable Instruction Following

📅 2025-07-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit poor generalization to unseen output constraints—such as “answer only with ‘yes’ or ‘no’” or “mention ‘abrakadabra’ exactly three times”—resulting in unreliable instruction following. Method: We propose RLVR, a reinforcement learning framework with verifiable rewards, built upon a constraint verification module that integrates structured training data and precise, executable reward signals. Contribution/Results: We introduce IFBench—the first benchmark comprising 58 verifiable, cross-domain, and diverse output constraints—revealing severe overfitting in current models. We release 29 manually annotated constraint examples, their corresponding validation functions, and RLVR training resources. Experiments demonstrate that RLVR significantly improves zero-shot constraint adherence accuracy on IFBench, establishing a new paradigm and technical pathway for reliable human–AI interaction.

Technology Category

Application Category

📝 Abstract
A crucial factor for successful human and AI interaction is the ability of language models or chatbots to follow human instructions precisely. A common feature of instructions are output constraints like ``only answer with yes or no" or ``mention the word `abrakadabra' at least 3 times" that the user adds to craft a more useful answer. Even today's strongest models struggle with fulfilling such constraints. We find that most models strongly overfit on a small set of verifiable constraints from the benchmarks that test these abilities, a skill called precise instruction following, and are not able to generalize well to unseen output constraints. We introduce a new benchmark, IFBench, to evaluate precise instruction following generalization on 58 new, diverse, and challenging verifiable out-of-domain constraints. In addition, we perform an extensive analysis of how and on what data models can be trained to improve precise instruction following generalization. Specifically, we carefully design constraint verification modules and show that reinforcement learning with verifiable rewards (RLVR) significantly improves instruction following. In addition to IFBench, we release 29 additional new hand-annotated training constraints and verification functions, RLVR training prompts, and code.
Problem

Research questions and friction points this paper is trying to address.

Models struggle with following diverse output constraints
Overfitting on benchmark constraints limits generalization
Need for better training methods to improve instruction adherence
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces IFBench for diverse constraint evaluation
Uses constraint verification modules for accuracy
Applies RLVR to improve instruction following
🔎 Similar Papers
No similar papers found.