QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation

📅 2025-11-19

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Existing LLM-driven query reformulation research lacks unified tooling, leading to inconsistent method implementations, poor experimental reproducibility, and unfair benchmark evaluation. To address this, we introduce QueryReformulator—the first open-source Python toolkit specifically designed for LLM-based query reformulation. It provides standardized interfaces, modular APIs, centralized prompt management with version control, and seamless integration with retrieval frameworks such as Pyserini and PyTerrier. The toolkit supports major benchmarks including BEIR and MS MARCO. QueryReformulator uniformly encapsulates diverse LLM reformulation strategies—e.g., rewriting, expansion, and decomposition—thereby significantly improving experimental reproducibility, method comparability, and system extensibility. Released under an open-source license, it enables community-driven development and continuous improvement.

Technology Category

Application Category

📝 Abstract

We present QueryGym, a lightweight, extensible Python toolkit that supports large language model (LLM)-based query reformulation. This is an important tool development since recent work on llm-based query reformulation has shown notable increase in retrieval effectiveness. However, while different authors have sporadically shared the implementation of their methods, there is no unified toolkit that provides a consistent implementation of such methods, which hinders fair comparison, rapid experimentation, consistent benchmarking and reliable deployment. QueryGym addresses this gap by providing a unified framework for implementing, executing, and comparing llm-based reformulation methods. The toolkit offers: (1) a Python API for applying diverse LLM-based methods, (2) a retrieval-agnostic interface supporting integration with backends such as Pyserini and PyTerrier, (3) a centralized prompt management system with versioning and metadata tracking, (4) built-in support for benchmarks like BEIR and MS MARCO, and (5) a completely open-source extensible implementation available to all researchers. QueryGym is publicly available at https://github.com/radinhamidi/QueryGym.

Problem

Research questions and friction points this paper is trying to address.

Lack of unified toolkit for LLM-based query reformulation methods

Hinders fair comparison and consistent benchmarking of approaches

Addresses gap in reproducible experimentation and reliable deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified Python toolkit for LLM query reformulation

Retrieval-agnostic interface supporting multiple backends

Centralized prompt management with versioning system

🔎 Similar Papers

A Survey on Employing Large Language Models for Text-to-SQL Tasks