Unlocking Legal Knowledge: A Multilingual Dataset for Judicial Summarization in Switzerland

📅 2024-10-17
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
In legal research, case summaries (headnotes) are critical for efficient information retrieval; however, most Swiss judicial decisions lack human-authored annotations, and automated summarization faces severe low-resource challenges in the multilingual (German/French/Italian) setting. To address this, we introduce SLDS—the first multilingual Swiss legal summarization dataset—comprising 18,000 Federal Supreme Court judgments aligned with expert-written German summaries, thereby establishing a foundational benchmark. We propose a lightweight mT5-based fine-tuning framework integrating zero-shot and one-shot prompting with supervised fine-tuning to enable cross-lingual judgment-to-German summarization. Empirical evaluation demonstrates that our compact fine-tuned model matches the summary quality and inference efficiency of large-scale models. SLDS is publicly released, offering a reproducible, cost-effective pathway for deploying legal AI in low-resource multilingual jurisdictions.

Technology Category

Application Category

📝 Abstract
Legal research is a time-consuming task that most lawyers face on a daily basis. A large part of legal research entails looking up relevant caselaw and bringing it in relation to the case at hand. Lawyers heavily rely on summaries (also called headnotes) to find the right cases quickly. However, not all decisions are annotated with headnotes and writing them is time-consuming. Automated headnote creation has the potential to make hundreds of thousands of decisions more accessible for legal research in Switzerland alone. To kickstart this, we introduce the Swiss Leading Decision Summarization ( SLDS) dataset, a novel cross-lingual resource featuring 18K court rulings from the Swiss Federal Supreme Court (SFSC), in German, French, and Italian, along with German headnotes. We fine-tune and evaluate three mT5 variants, along with proprietary models. Our analysis highlights that while proprietary models perform well in zero-shot and one-shot settings, fine-tuned smaller models still provide a strong competitive edge. We publicly release the dataset to facilitate further research in multilingual legal summarization and the development of assistive technologies for legal professionals
Problem

Research questions and friction points this paper is trying to address.

Automated headnote creation for Swiss court decisions
Multilingual dataset for judicial summarization in Switzerland
Improving legal research efficiency through automated summarization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual dataset for judicial summarization
Fine-tuned mT5 variants for headnote generation
Cross-lingual resource with 18K court rulings
🔎 Similar Papers
No similar papers found.