A Comparative Study of Retrieval Methods in Azure AI Search

📅 2025-12-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Legal practitioners require efficient and precise key information extraction during the Early Case Assessment (ECA) phase of electronic discovery (eDiscovery), yet conventional keyword and semantic search approaches exhibit notable limitations. This study presents the first systematic evaluation of five retrieval strategies—keyword, semantic, vector, hybrid, and hybrid-semantic—within a legal Retrieval-Augmented Generation (RAG) framework powered by Azure AI Search and large language models (LLMs) for natural-language question answering. Performance is measured along three dimensions: accuracy, relevance, and response consistency. Results demonstrate that hybrid-semantic retrieval significantly outperforms all alternatives, achieving the highest accuracy and stability in legal document review tasks. This work provides empirically grounded guidance for configuring optimal RAG systems in ECA, offering both methodological rigor and practical implementation insights. It addresses a critical gap in the literature by delivering the first comparative analysis of retrieval strategies specifically tailored to the legal domain.

Technology Category

Application Category

📝 Abstract
Increasingly, attorneys are interested in moving beyond keyword and semantic search to improve the efficiency of how they find key information during a document review task. Large language models (LLMs) are now seen as tools that attorneys can use to ask natural language questions of their data during document review to receive accurate and concise answers. This study evaluates retrieval strategies within Microsoft Azure's Retrieval-Augmented Generation (RAG) framework to identify effective approaches for Early Case Assessment (ECA) in eDiscovery. During ECA, legal teams analyze data at the outset of a matter to gain a general understanding of the data and attempt to determine key facts and risks before beginning full-scale review. In this paper, we compare the performance of Azure AI Search's keyword, semantic, vector, hybrid, and hybrid-semantic retrieval methods. We then present the accuracy, relevance, and consistency of each method's AI-generated responses. Legal practitioners can use the results of this study to enhance how they select RAG configurations in the future.
Problem

Research questions and friction points this paper is trying to address.

Evaluates retrieval methods for legal document review efficiency.
Compares Azure AI Search techniques for Early Case Assessment.
Assesses AI response accuracy in eDiscovery RAG frameworks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid and hybrid-semantic retrieval methods compared
Azure AI Search's retrieval strategies evaluated for eDiscovery
RAG framework performance assessed for legal document review
🔎 Similar Papers
No similar papers found.
Q
Qiang Mao
Legal Technology & Data Analytics, Ankura Consulting Group, LLC, Washington, D.C. USA
Han Qin
Han Qin
Ankura Consulting Group, LLC.
GeospatialAILegal
R
Robert Neary
Legal Technology & Data Analytics, Ankura Consulting Group, LLC, Washington, D.C. USA
Charles Wang
Charles Wang
Professor/Director, Center for Genomics, Loma Linda University
F
Fusheng Wei
Legal Technology & Data Analytics, Ankura Consulting Group, LLC, Washington, D.C. USA
J
Jianping Zhang
Legal Technology & Data Analytics, Ankura Consulting Group, LLC, Washington, D.C. USA
N
Nathaniel Huber-Fliflet
Legal Technology & Data Analytics, Ankura Consulting Group, LLC, London, UK