Probing BERT for German Compound Semantics

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

190K/year

🤖 AI Summary

This study investigates how well German pre-trained BERT models encode compositional semantics of noun compounds, particularly morphological transparency. Addressing German’s high productivity and strong constituent ambiguity, we conduct the first systematic layer-wise probing analysis for this language: we employ a classification-based semantic separability probe to assess the extractability of compositional information across layers, comparing cased and uncased models on an 868-item gold-standard compound dataset. Results show that early transformer layers contain the richest compositional semantic signals; however, overall probing performance is substantially lower than in analogous English tasks, confirming that compositional semantic knowledge in German BERT is significantly harder to decode. This work establishes the first German noun-compound compositionality benchmark and provides critical empirical evidence for cross-lingual semantic modeling.

Technology Category

Application Category

📝 Abstract

This paper investigates the extent to which pretrained German BERT encodes knowledge of noun compound semantics. We comprehensively vary combinations of target tokens, layers, and cased vs. uncased models, and evaluate them by predicting the compositionality of 868 gold standard compounds. Looking at representational patterns within the transformer architecture, we observe trends comparable to equivalent prior work on English, with compositionality information most easily recoverable in the early layers. However, our strongest results clearly lag behind those reported for English, suggesting an inherently more difficult task in German. This may be due to the higher productivity of compounding in German than in English and the associated increase in constituent-level ambiguity, including in our target compound set.

Problem

Research questions and friction points this paper is trying to address.

Investigates German BERT's knowledge of noun compound semantics

Evaluates compositionality prediction using 868 gold standard compounds

Compares German and English BERT performance on compound semantics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses German BERT for compound semantics analysis

Evaluates token-layer-model combinations comprehensively

Compares German-English compositionality recovery trends

🔎 Similar Papers

No similar papers found.