Lightweight Relevance Grader in RAG

๐Ÿ“… 2025-06-17
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the challenges of verifying retrieved document relevance and the high computational cost of high-accuracy scorers in RAG systems, this paper proposes a lightweight relevance classifier: we fine-tune Llama-3.2-1B via supervised learning to perform binary relevance classification. Evaluated on standard benchmarks, the model achieves a discrimination accuracy of 0.7750โ€”comparable to Llama-3.1-70B (0.7742)โ€”while using only 1.4% of its parameters and significantly reducing inference latency and GPU memory consumption. Integrated into the RAG pipeline, it improves relevant document retrieval accuracy from 0.1301 to 0.7750. Furthermore, the model supports edge deployment, enabling efficient, low-cost RAG implementation. This work establishes a new paradigm for resource-efficient relevance scoring in production RAG systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Retrieval-Augmented Generation (RAG) addresses limitations of large language models (LLMs) by leveraging a vector database to provide more accurate and up-to-date information. When a user submits a query, RAG executes a vector search to find relevant documents, which are then used to generate a response. However, ensuring the relevance of retrieved documents with a query would be a big challenge. To address this, a secondary model, known as a relevant grader, can be served to verify its relevance. To reduce computational requirements of a relevant grader, a lightweight small language model is preferred. In this work, we finetuned llama-3.2-1b as a relevant grader and achieved a significant increase in precision from 0.1301 to 0.7750. Its precision is comparable to that of llama-3.1-70b. Our code is available at https://github.com/taeheej/Lightweight-Relevance-Grader-in-RAG.
Problem

Research questions and friction points this paper is trying to address.

Ensuring relevance of retrieved documents in RAG
Reducing computational cost of relevance grading
Improving precision with lightweight small language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight small language model for relevance grading
Finetuned llama-3.2-1b as a relevance grader
Achieved high precision comparable to larger models
๐Ÿ”Ž Similar Papers
No similar papers found.