ShiZhi: A Chinese Lightweight Large Language Model for Court View Generation

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

To address the challenges in generating “Court’s View” sections from Chinese criminal judgments—where complex case facts hinder direct generation—this paper proposes a lightweight Chinese large language model explicitly designed for legal-logical reasoning. We introduce CCVG, the first large-scale, open-source dataset for Chinese court-view generation, comprising 110,000 high-quality annotated samples. Our method employs a domain-specific fine-tuning paradigm that jointly models factual encoding and legal reasoning. Experimental results demonstrate state-of-the-art performance: 58.5 BLEU-1 for view generation, 86.1% accuracy in charge prediction, and a macro-F1 score of 92.5%, confirming the efficacy and practicality of compact models under high-quality domain data. Key contributions include: (1) the first dedicated Chinese court-view generation dataset; (2) the first lightweight LLM architecture incorporating multi-stage legal-logical enhancement; and (3) a reproducible domain-specific generation benchmark.

Technology Category

Application Category

📝 Abstract

Criminal Court View Generation (CVG) is a fundamental task in legal artificial intelligence, aiming to automatically generate the "Court View" section of a legal case document. Generating court views is challenging due to the diversity and complexity of case facts, and directly generating from raw facts may limit performance. In this paper, we present ShiZhi, the first large language model (LLM) specifically designed for court view generation. We construct a Chinese Court View Generation dataset, CCVG, of more than 110K cases, each containing fact descriptions paired with corresponding court views. Based on this dataset, ShiZhi achieving 58.5 BLEU-1 on court view generation and 86.1% accuracy with 92.5% macro F1 on charge prediction. Experimental results demonstrate that even a small LLM can generate reasonable and legally coherent court views when trained on high-quality domain-specific data. Our model and dataset are available at href{https://github.com/ZhitianHou/ShiZhi}{https://github.com/ZhitianHou/ShiZhi}.

Problem

Research questions and friction points this paper is trying to address.

Automatically generating court view sections from case facts

Addressing challenges in legal document diversity and complexity

Developing specialized LLM for Chinese court view generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chinese LLM designed for court view generation

Trained on domain-specific dataset with 110K cases

Small LLM achieving legal coherence through specialized training

🔎 Similar Papers

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval