🤖 AI Summary
Large language models (LLMs) pose judicial fairness risks in legal reasoning due to their opaque “black-box” nature, yet no systematic fairness evaluation framework exists for the legal domain.
Method: We propose JustEva—a novel, open-source evaluation toolkit—supporting structured output generation, multi-dimensional fairness quantification, and statistical inference. It introduces a fine-grained labeling scheme covering 65 non-legal attributes and defines three core fairness metrics: inconsistency, bias, and imbalanced errors. Visualization and regression analysis are integrated to ensure interpretability.
Contribution/Results: Empirical evaluation reveals significant fairness deficiencies across mainstream LLMs on legal tasks. JustEva successfully identifies bias sources and guides targeted model refinement. It establishes a reproducible, scalable, and domain-specific assessment paradigm for trustworthy legal AI, advancing both methodological rigor and practical deployability in law-oriented LLM evaluation.
📝 Abstract
The integration of Large Language Models (LLMs) into legal practice raises pressing concerns about judicial fairness, particularly due to the nature of their "black-box" processes. This study introduces JustEva, a comprehensive, open-source evaluation toolkit designed to measure LLM fairness in legal tasks. JustEva features several advantages: (1) a structured label system covering 65 extra-legal factors; (2) three core fairness metrics - inconsistency, bias, and imbalanced inaccuracy; (3) robust statistical inference methods; and (4) informative visualizations. The toolkit supports two types of experiments, enabling a complete evaluation workflow: (1) generating structured outputs from LLMs using a provided dataset, and (2) conducting statistical analysis and inference on LLMs' outputs through regression and other statistical methods. Empirical application of JustEva reveals significant fairness deficiencies in current LLMs, highlighting the lack of fair and trustworthy LLM legal tools. JustEva offers a convenient tool and methodological foundation for evaluating and improving algorithmic fairness in the legal domain.