🤖 AI Summary
This work proposes a novel paradigm for automatically distilling reusable expert rubrics from fragmented inline comments. Specifically tailored for writing and evaluation tasks, the approach leverages large language model–driven conditional generation, contrastive learning, and iterative refinement to reverse-engineer implicit assessment criteria from human- or model-generated feedback. By continuously identifying and resolving mismatches between existing comments and predicted rubrics, the method progressively refines the structured scoring guidelines. Evaluated in both real-world and controlled experimental settings, the proposed framework effectively aligns and distills unstructured comments into coherent rubrics, yielding significant improvements in comment prediction, rubric comprehension, and automated text revision—thereby demonstrating strong efficacy and generalizability.
📝 Abstract
Large language models (LLMs) are increasingly used for writing and review support, but their usefulness depends on context-dependent criteria, such as expert preferences or organization-specific conventions, that are often tacit, undocumented, and difficult to elicit directly. We propose a problem setting for learning reusable natural-language rubrics from accumulated inline comments on artifacts such as human-written or LLM-generated drafts. Our method infers rubrics from these comments and iteratively refines them by observing comment-wise mismatches between rubric-conditioned predictions and reference comments. We evaluate the proposed method in real-world review settings and in controlled settings with reference rubrics. These results show that inline comments can be distilled into reusable rubrics that support comment prediction, rubric understanding, and automatic artifact revision.