Towards Intelligent Legal Document Analysis: CNN-Driven Classification of Case Law Texts

📅 2026-04-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
Legal case documents are characterized by rigid formatting and dense technical terminology, making manual classification inefficient and error-prone. This work proposes a lightweight yet effective framework that integrates lemmatization, subword-aware FastText embeddings, and a multi-kernel one-dimensional convolutional neural network (CNN) to automatically classify citation processing types. With only 5.1 million parameters, the model achieves 97.26% accuracy, 96.82% macro F1-score, and 97.83% AUC-ROC on a dataset of 25,000 annotated documents. Inference takes just 0.31 milliseconds per document—13 times faster than BERT—demonstrating that a thoughtfully designed CNN can serve as a highly efficient alternative to heavyweight Transformer-based models while significantly outperforming established baselines.

Technology Category

Application Category

📝 Abstract
Legal practitioners and judicial institutions face an ever-growing volume of case-law documents characterised by formalised language, lengthy sentence structures, and highly specialised terminology, making manual triage both time-consuming and error-prone. This work presents a lightweight yet high-accuracy framework for citation-treatment classification that pairs lemmatisation-based preprocessing with subword-aware FastText embeddings and a multi-kernel one-dimensional Convolutional Neural Network (CNN). Evaluated on a publicly available corpus of 25,000 annotated legal documents with a 75/25 training-test partition, the proposed system achieves 97.26% classification accuracy and a macro F1-score of 96.82%, surpassing established baselines including fine-tuned BERT, Long Short-Term Memory (LSTM) with FastText, CNN with random embeddings, and a Term Frequency-Inverse Document Frequency (TF-IDF) k-Nearest Neighbour (KNN) classifier. The model also attains the highest Area Under the Receiver Operating Characteristic (AUC-ROC) curve of 97.83% among all compared systems while operating with only 5.1 million parameters and an inference latency of 0.31 ms per document - more than 13 times faster than BERT. Ablation experiments confirm the individual contribution of each pipeline component, and the confusion matrix reveals that residual errors are confined to semantically adjacent citation categories. These findings indicate that carefully designed convolutional architectures represent a scalable, resource-efficient alternative to heavyweight transformers for intelligent legal document analysis.
Problem

Research questions and friction points this paper is trying to address.

legal document analysis
case law classification
citation-treatment classification
intelligent legal systems
legal text processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

CNN
FastText
legal document classification
lemmatisation
lightweight NLP
Moinul Hossain
Moinul Hossain
Assistant Professor, Department of Cyber Security Engineering, George Mason University
Wireless Network SecuritySpectrum CoexistenceV2V Communication SecurityCognitive Radio
S
Sourav Rabi Das
Department of Computer Science and Engineering, Prime University, Dhaka, Bangladesh
Z
Zikrul Shariar Ayon
Department of Mechanical Engineering, Shajalal University of Science and Technology, Sylhet, Bangladesh
S
Sadia Afrin Promi
Department of Computer Science and Engineering, American International University–Bangladesh, Dhaka 1229, Bangladesh
Ahnaf Atef Choudhury
Ahnaf Atef Choudhury
PhD in Information Technology, George Mason University
Data ScienceApplied Machine LearningImage ProcessingMedical InformaticsNatural Language Processing
S
Shakila Rahman
Department of Computer Science and Engineering, American International University–Bangladesh, Dhaka 1229, Bangladesh
Jia Uddin
Jia Uddin
Woosong University
Fault Diagnosis using MLDL and TLMultimedia signal Processing