CourtPressGER: A German Court Decision to Press Release Summarization Dataset

📅 2025-12-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the imbalance between public readability and professional accuracy in German legal document summarization. We introduce the first benchmark dataset for public-oriented German judicial summarization—comprising 6.4k judgment–press-release pairs—and propose a triplet structure (judgment text, human-written press release, synthetic prompt) to support citizen-centered generation. A multidimensional evaluation framework is developed, integrating factual consistency verification, LLM-as-judge scoring, expert ranking, and conventional metrics (ROUGE/BERTScore). Methodologically, we adopt a hierarchical summarization strategy leveraging both small and large language models. Experiments show that large-model outputs approach human-level quality; small models, when hierarchically optimized, exhibit markedly improved long-text processing capability; and human-written press releases remain the optimal baseline. Our work fills critical gaps in legal NLP research concerning readability, accessibility, and civic communication.

Technology Category

Application Category

📝 Abstract
Official court press releases from Germany's highest courts present and explain judicial rulings to the public, as well as to expert audiences. Prior NLP efforts emphasize technical headnotes, ignoring citizen-oriented communication needs. We introduce CourtPressGER, a 6.4k dataset of triples: rulings, human-drafted press releases, and synthetic prompts for LLMs to generate comparable releases. This benchmark trains and evaluates LLMs in generating accurate, readable summaries from long judicial texts. We benchmark small and large LLMs using reference-based metrics, factual-consistency checks, LLM-as-judge, and expert ranking. Large LLMs produce high-quality drafts with minimal hierarchical performance loss; smaller models require hierarchical setups for long judgments. Initial benchmarks show varying model performance, with human-drafted releases ranking highest.
Problem

Research questions and friction points this paper is trying to address.

Generates readable summaries from German court rulings
Trains LLMs to create citizen-oriented press releases
Evaluates model performance on legal text summarization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset with triples for LLM training
Benchmarking LLMs with multiple evaluation metrics
Hierarchical setups for smaller model performance
🔎 Similar Papers
No similar papers found.
S
Sebastian Nagl
Technical University of Munich (TUM)
M
Mohamed Elganayni
Technical University of Munich (TUM)
M
Melanie Pospisil
Technical University of Munich (TUM)
Matthias Grabmair
Matthias Grabmair
Technical University of Munich
Data ScienceArtificial Intelligence & LawKnowledge Representation & Reasoning