LazyReview A Dataset for Uncovering Lazy Thinking in NLP Peer Reviews

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This paper addresses “cognitive laziness” (e.g., stereotyped judgments, overgeneralization—systematic heuristic biases) in NLP peer review, exacerbated by increasing reviewer workload. It introduces the first systematically defined, fine-grained annotated dataset targeting such biases. Methodologically, it integrates human expert annotation, zero-shot LLM evaluation, instruction tuning, and controlled experiments, and proposes a feedback-guided review revision paradigm. Results show instruction tuning improves LLM detection accuracy by 10–20 percentage points; feedback-revised reviews exhibit significant gains in comprehensiveness and actionability. Key contributions are: (1) the first fine-grained, laziness-oriented annotation dataset for NLP peer review; (2) a transferable framework for detecting and mitigating cognitive laziness in reviews; and (3) open-sourced dataset and an enhanced review guideline incorporating bias-aware best practices.

Technology Category

Application Category

📝 Abstract

Peer review is a cornerstone of quality control in scientific publishing. With the increasing workload, the unintended use of `quick' heuristics, referred to as lazy thinking, has emerged as a recurring issue compromising review quality. Automated methods to detect such heuristics can help improve the peer-reviewing process. However, there is limited NLP research on this issue, and no real-world dataset exists to support the development of detection tools. This work introduces LazyReview, a dataset of peer-review sentences annotated with fine-grained lazy thinking categories. Our analysis reveals that Large Language Models (LLMs) struggle to detect these instances in a zero-shot setting. However, instruction-based fine-tuning on our dataset significantly boosts performance by 10-20 performance points, highlighting the importance of high-quality training data. Furthermore, a controlled experiment demonstrates that reviews revised with lazy thinking feedback are more comprehensive and actionable than those written without such feedback. We will release our dataset and the enhanced guidelines that can be used to train junior reviewers in the community. (Code available here: https://github.com/UKPLab/arxiv2025-lazy-review)

Problem

Research questions and friction points this paper is trying to address.

Detect lazy thinking in NLP peer reviews

Address lack of real-world lazy review datasets

Improve review quality via automated detection tools

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset for lazy thinking detection in reviews

Fine-tuning LLMs boosts detection performance

Revised reviews with feedback are more actionable

🔎 Similar Papers

No similar papers found.