A Data Science Approach to Calcutta High Court Judgments: An Efficient LLM and RAG-powered Framework for Summarization and Similar Cases Retrieval

📅 2025-06-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of lengthy judgment texts and inefficient case retrieval in the Calcutta High Court, this study proposes an end-to-end legal intelligence analysis framework integrating fine-tuned PEGASUS with Retrieval-Augmented Generation (RAG). Methodologically, we introduce a novel two-stage summarization pipeline: first, extracting salient legal elements to generate semantically coherent, concise summaries; second, constructing a high-quality vector database for semantic retrieval. RAG is then employed to enable precise, semantics-driven case matching. The framework significantly enhances legal text comprehension and information access efficiency, outperforming baseline models in both summary quality (ROUGE-L +12.3%) and retrieval accuracy (Top-5 Recall +18.7%). This work establishes a scalable, interpretable AI-assisted decision-making paradigm tailored for resource-constrained judicial systems.

Technology Category

Application Category

📝 Abstract
The judiciary, as one of democracy's three pillars, is dealing with a rising amount of legal issues, needing careful use of judicial resources. This research presents a complex framework that leverages Data Science methodologies, notably Large Language Models (LLM) and Retrieval-Augmented Generation (RAG) techniques, to improve the efficiency of analyzing Calcutta High Court verdicts. Our framework focuses on two key aspects: first, the creation of a robust summarization mechanism that distills complex legal texts into concise and coherent summaries; and second, the development of an intelligent system for retrieving similar cases, which will assist legal professionals in research and decision making. By fine-tuning the Pegasus model using case head note summaries, we achieve significant improvements in the summarization of legal cases. Our two-step summarizing technique preserves crucial legal contexts, allowing for the production of a comprehensive vector database for RAG. The RAG-powered framework efficiently retrieves similar cases in response to user queries, offering thorough overviews and summaries. This technique not only improves legal research efficiency, but it also helps legal professionals and students easily acquire and grasp key legal information, benefiting the overall legal scenario.
Problem

Research questions and friction points this paper is trying to address.

Summarizing complex legal texts efficiently
Retrieving similar legal cases intelligently
Enhancing legal research and decision-making
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLM for legal text summarization
Implements RAG for similar case retrieval
Fine-tunes Pegasus model for improved accuracy
🔎 Similar Papers
No similar papers found.
P
Puspendu Banerjee
A.K.Choudhury School of IT, University of Calcutta, Kolkata, West Bengal, India.
A
Aritra Mazumdar
A.K.Choudhury School of IT, University of Calcutta, Kolkata, West Bengal, India.
W
Wazib Ansar
A.K.Choudhury School of IT, University of Calcutta, Kolkata, West Bengal, India.
S
Saptarsi Goswami
Department of Computer Science, Bangabasi Morning College, Kolkata, West Bengal, India.
Amlan Chakrabarti
Amlan Chakrabarti
Professor and Director, A.K.Choudhury School of Information Technology, University of
Quantum ComputingVLSI DesignEmbedded SystemsExpert SystemsComputer Vision