🤖 AI Summary
This study investigates whether large language models (LLMs) exhibit motivated reasoning—systematic, role-congruent bias in factual selection and argument construction—when generating role-specific legal summaries (e.g., for judges, prosecutors, or attorneys).
Method: Employing a role-conditioned prompting design grounded in legal realism, we develop a dual-dimensional evaluation framework measuring both legal fact coverage and stakeholder-oriented bias. Experiments explicitly test whether “neutrality” instructions mitigate role-driven distortions.
Contribution/Results: Even under neutrality constraints, LLMs exhibit statistically significant, role-consistent information filtering—implicitly inferring and reinforcing position-specific stances. This work provides the first quantitative evidence of structural stance bias in legal summarization by LLMs. It introduces the “role-aware evaluation” paradigm, establishing a methodological foundation for identifying and governing alignment risks in high-stakes legal AI applications.
📝 Abstract
Large Language Models (LLMs) are increasingly used to generate user-tailored summaries, adapting outputs to specific stakeholders. In legal contexts, this raises important questions about motivated reasoning -- how models strategically frame information to align with a stakeholder's position within the legal system. Building on theories of legal realism and recent trends in legal practice, we investigate how LLMs respond to prompts conditioned on different legal roles (e.g., judges, prosecutors, attorneys) when summarizing judicial decisions. We introduce an evaluation framework grounded in legal fact and reasoning inclusion, also considering favorability towards stakeholders. Our results show that even when prompts include balancing instructions, models exhibit selective inclusion patterns that reflect role-consistent perspectives. These findings raise broader concerns about how similar alignment may emerge as LLMs begin to infer user roles from prior interactions or context, even without explicit role instructions. Our results underscore the need for role-aware evaluation of LLM summarization behavior in high-stakes legal settings.