π€ AI Summary
This study addresses the challenge of conflicting expert opinions in scientific peer review, where existing methods struggle to identify contradictory evidence at a fine-grained, full-text level and quantify the intensity of disagreement. To tackle this, the authors propose the IMPACT framework, which integrates aspect-conditioned evidence extraction, deliberative multi-agent reasoning, and a adjudication mechanism to accurately model reviewer disagreements. They also introduce RevCI, the first fine-grained contradiction annotation benchmark tailored to full-text reviews. Furthermore, through knowledge distillation, they derive TIDEβa lightweight model capable of efficient prediction with only a single forward pass. Experimental results demonstrate that IMPACT significantly outperforms baseline methods in both evidence identification and disagreement intensity scoring, while TIDE achieves competitive performance with substantially reduced inference cost.
π Abstract
Scientific peer reviews frequently contain conflicting expert judgments, and the increasing scale of conference submissions makes it challenging for Area Chairs and editors to reliably identify and interpret such disagreements. Existing approaches typically frame reviewer disagreement as binary contradiction detection over isolated sentence pairs, abstracting away the review-level context and obscuring differences in the severity of evaluative conflict. In this work, we introduce a fine-grained formulation of reviewer contradiction analysis that operates over full peer reviews by explicitly identifying contradiction evidence spans and assigning graded disagreement intensity scores. To support this task, we present RevCI, an expert-annotated benchmark of peer-review pairs with evidence-level contradiction annotations with graded intensity labels. We further propose IMPACT, a structured multi-agent framework that integrates aspect-conditioned evidence extraction, deliberative reasoning, and adjudication to model reviewer contradictions and their intensity. To support efficient deployment, we distill IMPACT into TIDE, a small language model that predicts contradiction evidence and intensity in a single forward pass. Experimental results show that IMPACT substantially outperforms strong single-agent and generic multi-agent baselines in both evidence identification and intensity agreement, while TIDE achieves competitive performance at significantly lower inference cost.