π€ AI Summary
The prevailing narrative of judicial formalism in Central and Eastern Europe (CEE) lacks empirical validation. Method: We introduce MADON, the first fine-grained annotated Czech judicial argumentation dataset, and propose a multi-stage hybrid NLP framework integrating continued pretraining of Czech legal BERT, a lightweight Llama 3.1 reasoning module, asymmetric loss with class weighting, and interpretable traditional feature engineering. Contribution/Results: We establish the first computationally grounded classification paradigm for judicial philosophy, achieving strong performance on argument paragraph detection (macro-F1 = 82.6%), eight-way argument type classification (77.5%), and formalist judgment identification (83.2%). Our findings robustly refute the dominant claim of pervasive judicial formalism across CEE jurisdictions. The methodology demonstrates cross-jurisdictional reproducibility and is fully open-sourced.
π Abstract
Courts must justify their decisions, but systematically analyzing judicial reasoning at scale remains difficult. This study refutes claims about formalistic judging in Central and Eastern Europe (CEE) by developing automated methods to detect and classify judicial reasoning in Czech Supreme Courts' decisions using state-of-the-art natural language processing methods. We create the MADON dataset of 272 decisions from two Czech Supreme Courts with expert annotations of 9,183 paragraphs with eight argument types and holistic formalism labels for supervised training and evaluation. Using a corpus of 300k Czech court decisions, we adapt transformer LLMs for Czech legal domain by continued pretraining and experiment with methods to address dataset imbalance including asymmetric loss and class weighting. The best models successfully detect argumentative paragraphs (82.6% macro-F1), classify traditional types of legal argument (77.5% macro-F1), and classify decisions as formalistic/non-formalistic (83.2% macro-F1). Our three-stage pipeline combining ModernBERT, Llama 3.1, and traditional feature-based machine learning achieves promising results for decision classification while reducing computational costs and increasing explainability. Empirically, we challenge prevailing narratives about CEE formalism. This work shows that legal argument mining enables reliable judicial philosophy classification and shows the potential of legal argument mining for other important tasks in computational legal studies. Our methodology is easily replicable across jurisdictions, and our entire pipeline, datasets, guidelines, models, and source codes are available at https://github.com/trusthlt/madon.