Probing BERT for German Compound Semantics

πŸ“… 2025-05-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study investigates how well German pre-trained BERT models encode compositional semantics of noun compounds, particularly morphological transparency. Addressing German’s high productivity and strong constituent ambiguity, we conduct the first systematic layer-wise probing analysis for this language: we employ a classification-based semantic separability probe to assess the extractability of compositional information across layers, comparing cased and uncased models on an 868-item gold-standard compound dataset. Results show that early transformer layers contain the richest compositional semantic signals; however, overall probing performance is substantially lower than in analogous English tasks, confirming that compositional semantic knowledge in German BERT is significantly harder to decode. This work establishes the first German noun-compound compositionality benchmark and provides critical empirical evidence for cross-lingual semantic modeling.

Technology Category

Application Category

πŸ“ Abstract
This paper investigates the extent to which pretrained German BERT encodes knowledge of noun compound semantics. We comprehensively vary combinations of target tokens, layers, and cased vs. uncased models, and evaluate them by predicting the compositionality of 868 gold standard compounds. Looking at representational patterns within the transformer architecture, we observe trends comparable to equivalent prior work on English, with compositionality information most easily recoverable in the early layers. However, our strongest results clearly lag behind those reported for English, suggesting an inherently more difficult task in German. This may be due to the higher productivity of compounding in German than in English and the associated increase in constituent-level ambiguity, including in our target compound set.
Problem

Research questions and friction points this paper is trying to address.

Investigates German BERT's knowledge of noun compound semantics
Evaluates compositionality prediction using 868 gold standard compounds
Compares German and English BERT performance on compound semantics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses German BERT for compound semantics analysis
Evaluates token-layer-model combinations comprehensively
Compares German-English compositionality recovery trends
πŸ”Ž Similar Papers
No similar papers found.
F
Filip Mileti'c
Institute for Natural Language Processing, University of Stuttgart, Germany
A
Aaron Schmid
Institute for Natural Language Processing, University of Stuttgart, Germany
Sabine Schulte im Walde
Sabine Schulte im Walde
University of Stuttgart, Germany
Computational Linguistics