🤖 AI Summary
This study investigates regional disparities in hallucination rates of large language models (LLMs) when answering legal questions across three jurisdictions—Los Angeles, London, and Sydney—to assess how geographic factors affect the reliability of AI-powered legal information services.
Method: We propose a cross-jurisdictional evaluation framework grounded in comparative functionalism, construct a test dataset derived from real-world legal queries on Reddit, and generate jurisdiction-specific statutory summaries using closed-source LLMs. Hallucination rates and model uncertainty are quantified via expert annotation and multi-round response consistency analysis.
Contribution/Results: We find statistically significant regional variation in legal hallucination rates (p < 0.01), strongly negatively correlated with modal response frequency (r = −0.82), indicating systematic geographic inequity in LLM legal knowledge distribution. Crucially, this work introduces response consistency as a novel, reproducible metric for jurisdictional uncertainty—establishing a methodological foundation for fairness-aware evaluation of LLMs in legal applications.
📝 Abstract
How do we make a meaningful comparison of a large language model's knowledge of the law in one place compared to another? Quantifying these differences is critical to understanding if the quality of the legal information obtained by users of LLM-based chatbots varies depending on their location. However, obtaining meaningful comparative metrics is challenging because legal institutions in different places are not themselves easily comparable. In this work we propose a methodology to obtain place-to-place metrics based on the comparative law concept of functionalism. We construct a dataset of factual scenarios drawn from Reddit posts by users seeking legal advice for family, housing, employment, crime and traffic issues. We use these to elicit a summary of a law from the LLM relevant to each scenario in Los Angeles, London and Sydney. These summaries, typically of a legislative provision, are manually evaluated for hallucinations. We show that the rate of hallucination of legal information by leading closed-source LLMs is significantly associated with place. This suggests that the quality of legal solutions provided by these models is not evenly distributed across geography. Additionally, we show a strong negative correlation between hallucination rate and the frequency of the majority response when the LLM is sampled multiple times, suggesting a measure of uncertainty of model predictions of legal facts.