Chinese Labor Law Large Language Model Benchmark

📅 2026-01-15

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses the limitations of general-purpose large language models in the Chinese labor law domain, where they exhibit insufficient factual accuracy, weak reasoning capabilities, and poor context sensitivity. To overcome these challenges, we propose LabourLawLLM—the first specialized large language model tailored for Chinese labor law—and introduce LabourLawBench, a comprehensive multi-task evaluation benchmark encompassing legal article retrieval, question answering, case classification, and compensation calculation. Through domain-adaptive pretraining and instruction fine-tuning, LabourLawLLM demonstrates significant performance gains over both general-purpose and existing legal large language models across all tasks, as measured by ROUGE-L, accuracy, F1 score, and GPT-4–based subjective evaluation. Our results validate the model’s effectiveness in enhancing the accuracy, reliability, and societal utility of legal AI systems, while offering a scalable methodology for specialization in other legal subdomains.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) have led to substantial progress in domain-specific applications, particularly within the legal domain. However, general-purpose models such as GPT-4 often struggle with specialized subdomains that require precise legal knowledge, complex reasoning, and contextual sensitivity. To address these limitations, we present LabourLawLLM, a legal large language model tailored to Chinese labor law. We also introduce LabourLawBench, a comprehensive benchmark covering diverse labor-law tasks, including legal provision citation, knowledge-based question answering, case classification, compensation computation, named entity recognition, and legal case analysis. Our evaluation framework combines objective metrics (e.g., ROUGE-L, accuracy, F1, and soft-F1) with subjective assessment based on GPT-4 scoring. Experiments show that LabourLawLLM consistently outperforms general-purpose and existing legal-specific LLMs across task categories. Beyond labor law, our methodology provides a scalable approach for building specialized LLMs in other legal subfields, improving accuracy, reliability, and societal value of legal AI applications.

Problem

Research questions and friction points this paper is trying to address.

large language models

Chinese labor law

legal domain

specialized subdomains

legal reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

LabourLawLLM

LabourLawBench

domain-specific LLM