🤖 AI Summary
Current large language models (LLMs) face significant challenges in evaluating and improving instruction following (IF) capabilities—particularly for complex, multi-turn, and system-level instructions—due to the lack of high-quality benchmarks and unreliable, uninterpretable reward signals. To address these limitations, we propose AdvancedIF, the first fine-grained, human-annotated benchmark specifically designed for advanced IF evaluation. Complementing this, we introduce RIFL, a novel training framework that, for the first time, decouples expert-crafted scoring rubrics into learnable, structured reward signals. RIFL integrates a rubric-verification model, reward shaping, and reinforcement learning-based post-training to enable precise, interpretable IF modeling. Our approach balances annotation reliability with scalable automated feedback. On AdvancedIF, RIFL achieves a 6.7% absolute improvement; it also demonstrates strong generalization across multiple public benchmarks, validating both its effectiveness and interpretability.
📝 Abstract
Recent progress in large language models (LLMs) has led to impressive performance on a range of tasks, yet advanced instruction following (IF)-especially for complex, multi-turn, and system-prompted instructions-remains a significant challenge. Rigorous evaluation and effective training for such capabilities are hindered by the lack of high-quality, human-annotated benchmarks and reliable, interpretable reward signals. In this work, we introduce AdvancedIF (we will release this benchmark soon), a comprehensive benchmark featuring over 1,600 prompts and expert-curated rubrics that assess LLMs ability to follow complex, multi-turn, and system-level instructions. We further propose RIFL (Rubric-based Instruction-Following Learning), a novel post-training pipeline that leverages rubric generation, a finetuned rubric verifier, and reward shaping to enable effective reinforcement learning for instruction following. Extensive experiments demonstrate that RIFL substantially improves the instruction-following abilities of LLMs, achieving a 6.7% absolute gain on AdvancedIF and strong results on public benchmarks. Our ablation studies confirm the effectiveness of each component in RIFL. This work establishes rubrics as a powerful tool for both training and evaluating advanced IF in LLMs, paving the way for more capable and reliable AI systems.