Generating Diverse Criteria On-the-Fly to Improve Point-wise LLM Rankers

📅 2024-04-18

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 1

career value

191K/year

🤖 AI Summary

Pointwise large language model (LLM) rankers suffer from limited adherence to standardized comparative guidelines and insufficient capability in holistically evaluating complex passages. To address this, we propose a dynamic multi-perspective evaluation criterion generation method: leveraging prompt engineering to instantiate interpretable, dimension-specific criteria—covering semantics, relevance, structure, and more—in real time, and jointly aggregating scores across these criteria. This mechanism is the first to achieve decomposability, interpretability, and synergistic enhancement in LLM-based evaluation. Evaluated on the BEIR benchmark across eight diverse datasets, our approach significantly improves ranking performance, yielding an average 3.2% relative gain in NDCG@10. Results demonstrate that dynamic, multi-perspective guidance effectively enhances the ranking capability of pointwise LLM rankers.

Technology Category

Application Category

📝 Abstract

The most recent pointwise Large Language Model (LLM) rankers have achieved remarkable ranking results. However, these rankers are hindered by two major drawbacks: (1) they fail to follow a standardized comparison guidance during the ranking process, and (2) they struggle with comprehensive considerations when dealing with complicated passages. To address these shortcomings, we propose to build a ranker that generates ranking scores based on a set of criteria from various perspectives. These criteria are intended to direct each perspective in providing a distinct yet synergistic evaluation. Our research, which examines eight datasets from the BEIR benchmark demonstrates that incorporating this multi-perspective criteria ensemble approach markedly enhanced the performance of pointwise LLM rankers.

Problem

Research questions and friction points this paper is trying to address.

Standardized comparison guidance lacking in LLM rankers

Inadequate comprehensive analysis for complex passages

Need multi-perspective criteria to enhance ranking performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates diverse ranking criteria dynamically

Uses multi-perspective criteria ensemble approach

Enhances pointwise LLM rankers performance

🔎 Similar Papers

An Investigation of Prompt Variations for Zero-shot LLM-based Rankers