APEER: Automatic Prompt Engineering Enhances Large Language Model Reranking

📅 2024-06-20
🏛️ arXiv.org
📈 Citations: 32
Influential: 1
📄 PDF
🤖 AI Summary
Large language models (LLMs) in information retrieval (IR) re-ranking heavily rely on manually engineered prompts, while existing automatic prompt engineering methods struggle to handle the complex input structure of long text–query pairs. Method: This paper proposes APEER, a fully automated prompt optimization framework for IR re-ranking. Its core innovations include: (i) the first systematic integration of automatic prompt engineering into IR re-ranking; (ii) preference learning–driven iterative optimization; (iii) zero-shot feedback mechanisms; (iv) multi-LLM collaborative evaluation; and (v) adaptive prompt quality scoring. Results: Extensive experiments across four mainstream LLMs and ten benchmark datasets demonstrate that APEER-generated prompts consistently outperform state-of-the-art manual prompts, achieving an average 3.2% improvement in NDCG@10. Moreover, APEER exhibits strong generalization across diverse IR tasks and LLM architectures.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have significantly enhanced Information Retrieval (IR) across various modules, such as reranking. Despite impressive performance, current zero-shot relevance ranking with LLMs heavily relies on human prompt engineering. Existing automatic prompt engineering algorithms primarily focus on language modeling and classification tasks, leaving the domain of IR, particularly reranking, underexplored. Directly applying current prompt engineering algorithms to relevance ranking is challenging due to the integration of query and long passage pairs in the input, where the ranking complexity surpasses classification tasks. To reduce human effort and unlock the potential of prompt optimization in reranking, we introduce a novel automatic prompt engineering algorithm named APEER. APEER iteratively generates refined prompts through feedback and preference optimization. Extensive experiments with four LLMs and ten datasets demonstrate the substantial performance improvement of APEER over existing state-of-the-art (SoTA) manual prompts. Furthermore, we find that the prompts generated by APEER exhibit better transferability across diverse tasks and LLMs. Code is available at https://github.com/jincan333/APEER.
Problem

Research questions and friction points this paper is trying to address.

Automating prompt engineering for LLM reranking in IR
Overcoming challenges of query-passage input complexity
Enhancing prompt transferability across tasks and models
Innovation

Methods, ideas, or system contributions that make the work stand out.

APEER automates prompt engineering for LLM reranking
Iterative feedback optimizes prompt refinement
Transfers prompts across tasks and LLMs effectively
🔎 Similar Papers
No similar papers found.