🤖 AI Summary
Spanish legal texts—particularly BOE (Boletín Oficial del Estado) decrees and notices—lack accessible, concise summaries, exacerbating information overload for non-expert readers.
Method: We introduce BOE-XSUM, the first extreme summarization dataset for Spanish official legal documents, comprising 3,648 human-written plain-language summaries. We fine-tune medium-scale models—including BERTIN and GPT-J 6B—in a supervised setting and compare them against zero-shot baselines. Summary accuracy is evaluated using exact-match metrics.
Contribution/Results: BOE-XSUM fills a critical gap in Spanish legal extreme summarization. Fine-tuned models substantially outperform zero-shot generation, achieving a best-case accuracy of 41.6%—a 24-percentage-point improvement—demonstrating that domain-specific data coupled with lightweight fine-tuning significantly enhances the generation of comprehensible, legally faithful summaries.
📝 Abstract
The ability to summarize long documents succinctly is increasingly important in daily life due to information overload, yet there is a notable lack of such summaries for Spanish documents in general, and in the legal domain in particular. In this work, we present BOE-XSUM, a curated dataset comprising 3,648 concise, plain-language summaries of documents sourced from Spain's ``Boletín Oficial del Estado'' (BOE), the State Official Gazette. Each entry in the dataset includes a short summary, the original text, and its document type label. We evaluate the performance of medium-sized large language models (LLMs) fine-tuned on BOE-XSUM, comparing them to general-purpose generative models in a zero-shot setting. Results show that fine-tuned models significantly outperform their non-specialized counterparts. Notably, the best-performing model -- BERTIN GPT-J 6B (32-bit precision) -- achieves a 24% performance gain over the top zero-shot model, DeepSeek-R1 (accuracies of 41.6% vs. 33.5%).