A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval

๐Ÿ“… 2025-06-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

175K/year
๐Ÿค– AI Summary
To address the challenges of lengthy judgment texts and inefficient case retrieval in the Calcutta High Court, this study proposes an end-to-end legal intelligence analysis framework integrating fine-tuned PEGASUS with Retrieval-Augmented Generation (RAG). Methodologically, we introduce a novel two-stage summarization pipeline: first, extracting salient legal elements to generate semantically coherent, concise summaries; second, constructing a high-quality vector database for semantic retrieval. RAG is then employed to enable precise, semantics-driven case matching. The framework significantly enhances legal text comprehension and information access efficiency, outperforming baseline models in both summary quality (ROUGE-L +12.3%) and retrieval accuracy (Top-5 Recall +18.7%). This work establishes a scalable, interpretable AI-assisted decision-making paradigm tailored for resource-constrained judicial systems.

Technology Category

Application Category

๐Ÿ“ Abstract
The judiciary, as one of democracy's three pillars, is dealing with a rising amount of legal issues, needing careful use of judicial resources. This research presents a complex framework that leverages Data Science methodologies, notably Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) techniques, to improve the efficiency of analyzing Calcutta High Court verdicts. Our framework focuses on two key aspects: first, the creation of a robust summarization mechanism that distills complex legal texts into concise and coherent summaries; and second, the development of an intelligent system for retrieving similar cases, which will assist legal professionals in research and decision making. By fine-tuning the Pegasus model using case head note summaries, we achieve significant improvements in the summarization of legal cases. Our two-step summarizing technique preserves crucial legal contexts, allowing for the production of a comprehensive vector database for RAG. The RAG-powered framework efficiently retrieves similar cases in response to user queries, offering thorough overviews and summaries. This technique not only improves legal research efficiency, but it also helps legal professionals and students easily acquire and grasp key legal information, benefiting the overall legal scenario.
Problem

Research questions and friction points this paper is trying to address.

Summarizing complex legal texts efficiently
Retrieving similar legal cases intelligently
Enhancing legal research and decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLM for legal text summarization
Implements RAG for similar case retrieval
Fine-tunes Pegasus model for improved accuracy