Towards Intelligent Legal Document Analysis: CNN-Driven Classification of Case Law Texts

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

Legal case documents are characterized by rigid formatting and dense technical terminology, making manual classification inefficient and error-prone. This work proposes a lightweight yet effective framework that integrates lemmatization, subword-aware FastText embeddings, and a multi-kernel one-dimensional convolutional neural network (CNN) to automatically classify citation processing types. With only 5.1 million parameters, the model achieves 97.26% accuracy, 96.82% macro F1-score, and 97.83% AUC-ROC on a dataset of 25,000 annotated documents. Inference takes just 0.31 milliseconds per document—13 times faster than BERT—demonstrating that a thoughtfully designed CNN can serve as a highly efficient alternative to heavyweight Transformer-based models while significantly outperforming established baselines.

Technology Category

Application Category

📝 Abstract

Legal practitioners and judicial institutions face an ever-growing volume of case-law documents characterised by formalised language, lengthy sentence structures, and highly specialised terminology, making manual triage both time-consuming and error-prone. This work presents a lightweight yet high-accuracy framework for citation-treatment classification that pairs lemmatisation-based preprocessing with subword-aware FastText embeddings and a multi-kernel one-dimensional Convolutional Neural Network (CNN). Evaluated on a publicly available corpus of 25,000 annotated legal documents with a 75/25 training-test partition, the proposed system achieves 97.26% classification accuracy and a macro F1-score of 96.82%, surpassing established baselines including fine-tuned BERT, Long Short-Term Memory (LSTM) with FastText, CNN with random embeddings, and a Term Frequency-Inverse Document Frequency (TF-IDF) k-Nearest Neighbour (KNN) classifier. The model also attains the highest Area Under the Receiver Operating Characteristic (AUC-ROC) curve of 97.83% among all compared systems while operating with only 5.1 million parameters and an inference latency of 0.31 ms per document - more than 13 times faster than BERT. Ablation experiments confirm the individual contribution of each pipeline component, and the confusion matrix reveals that residual errors are confined to semantically adjacent citation categories. These findings indicate that carefully designed convolutional architectures represent a scalable, resource-efficient alternative to heavyweight transformers for intelligent legal document analysis.

Problem

Research questions and friction points this paper is trying to address.

legal document analysis

case law classification

citation-treatment classification

intelligent legal systems

legal text processing

Innovation

Methods, ideas, or system contributions that make the work stand out.

CNN

FastText

legal document classification