🤖 AI Summary
This work addresses the challenge of generating constructive peer feedback that offers tangible improvement value to scientific authors. To this end, we introduce GoodPoint-ICLR, the first large-scale dataset annotated with author responses, and propose a novel definition of feedback effectiveness from the author’s perspective—balancing both validity and actionability—with author responses serving as success signals to guide model training. Building upon Qwen3-8B, our approach integrates supervised fine-tuning with an optimization strategy that fuses real and synthetic preference data. Evaluated on a test set of 1,200 ICLR papers, our model improves feedback success prediction by 83.7% over baseline methods and outperforms Gemini-3-flash in matching human-written gold-standard feedback. Expert evaluations further confirm its significantly higher practical utility for authors.
📝 Abstract
While LLMs hold significant potential to transform scientific research, we advocate for their use to augment and empower researchers rather than to automate research without human oversight. To this end, we study constructive feedback generation, the task of producing targeted, actionable feedback that helps authors improve both their research and its presentation. In this work, we operationalize the effectiveness of feedback along two author-centric axes-validity and author action. We first curate GoodPoint-ICLR, a dataset of 19K ICLR papers with reviewer feedback annotated along both dimensions using author responses. Building on this, we introduce GoodPoint, a training recipe that leverages success signals from author responses through fine-tuning on valid and actionable feedback, together with preference optimization on both real and synthetic preference pairs. Our evaluation on a benchmark of 1.2K ICLR papers shows that a GoodPoint-trained Qwen3-8B improves the predicted success rate by 83.7% over the base model and sets a new state-of-the-art among LLMs of similar size in feedback matching on a golden human feedback set, even surpassing Gemini-3-flash in precision. We further validate these findings through an expert human study, demonstrating that GoodPoint consistently delivers higher practical value as perceived by authors.