SteuerLLM: Local specialized large language model for German tax law analysis

📅 2026-02-11
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the limitations of general-purpose large language models in highly structured and regulation-intensive domains such as German tax law, where precise statutory citation, logical reasoning, and numerical computation are critical. To bridge this gap, the authors introduce SteuerEx, the first fine-grained open benchmark based on authentic German university tax law exam questions, and propose a domain-adaptation training approach that integrates retrieval-augmented generation with controlled synthetic data. The resulting specialized model, SteuerLLM, significantly outperforms same-sized general models across multiple tax-related tasks and even surpasses larger-scale systems. Evaluated in realistic examination settings, SteuerLLM demonstrates high accuracy and practical utility, underscoring that domain specialization is more effective than merely scaling up model size.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) demonstrate strong general reasoning and language understanding, yet their performance degrades in domains governed by strict formal rules, precise terminology, and legally binding structure. Tax law exemplifies these challenges, as correct answers require exact statutory citation, structured legal argumentation, and numerical accuracy under rigid grading schemes. We algorithmically generate SteuerEx, the first open benchmark derived from authentic German university tax law examinations. SteuerEx comprises 115 expert-validated examination questions spanning six core tax law domains and multiple academic levels, and employs a statement-level, partial-credit evaluation framework that closely mirrors real examination practice. We further present SteuerLLM, a domain-adapted LLM for German tax law trained on a large-scale synthetic dataset generated from authentic examination material using a controlled retrieval-augmented pipeline. SteuerLLM (28B parameters) consistently outperforms general-purpose instruction-tuned models of comparable size and, in several cases, substantially larger systems, demonstrating that domain-specific data and architectural adaptation are more decisive than parameter scale for performance on realistic legal reasoning tasks. All benchmark data, training datasets, model weights, and evaluation code are released openly to support reproducible research in domain-specific legal artificial intelligence. A web-based demo of SteuerLLM is available at https://steuerllm.i5.ai.fau.de.
Problem

Research questions and friction points this paper is trying to address.

legal reasoning
tax law
domain-specific LLM
formal rules
statutory citation
Innovation

Methods, ideas, or system contributions that make the work stand out.

domain-adapted LLM
legal reasoning benchmark
synthetic data generation
retrieval-augmented pipeline
tax law analysis
🔎 Similar Papers
No similar papers found.
S
Sebastian Wind
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany; Erlangen National High Performance Computing Center, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany; DATEV eG, Nuremberg, Germany
J
Jeta Sopa
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
L
Laurin Schmid
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany; Bavarian AI Taxation Laboratory, Department of Computer Science, University of Technology Nuremberg, Nuremberg, Germany
Q
Quirin Jackl
Chair for Tax Law and Public Law, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Nuremberg, Germany
S
Sebastian Kiefer
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany; DATEV eG, Nuremberg, Germany
F
Fei Wu
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
M
Martin Mayr
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany; Erlangen National High Performance Computing Center, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
H
Harald KĂśstler
Erlangen National High Performance Computing Center, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany; Chair of Computer Science 10, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
Gerhard Wellein
Gerhard Wellein
Friedrich-Alexander-Universität Erlangen-Nßrnberg
HPCPerformance ModellingPerformance EngineeringSparse Solvers and Kernels
A
Andreas Maier
Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany; Erlangen National High Performance Computing Center, Friedrich-Alexander-Universität Erlangen-Nßrnberg, Erlangen, Germany
Soroosh Tayebi Arasteh
Soroosh Tayebi Arasteh
RWTH Aachen University
Deep LearningAI in MedicineGenerative AIMedical Image Analysis