🤖 AI Summary
This study investigates whether language models rely on abstract grammatical rules or memory-based associations when using the German singular definite article. To this end, we introduce the gradient-based interpretability method GRADIEND—applied here for the first time to syntactic generalization analysis—to systematically examine parameter update directions and neuron activation patterns across different gender–case combinations. Our findings reveal that interventions targeting specific gender–case pairs often significantly affect seemingly unrelated combinations, with substantial overlap among the critical neurons involved. This suggests that model behavior arises from a hybrid mechanism combining rule-like encoding and memorized patterns. The work thus offers novel insights and methodological tools for understanding how large language models represent grammatical knowledge.
📝 Abstract
Language models perform well on grammatical agreement, but it is unclear whether this reflects rule-based generalization or memorization. We study this question for German definite singular articles, whose forms depend on gender and case. Using GRADIEND, a gradient-based interpretability method, we learn parameter update directions for gender-case specific article transitions. We find that updates learned for a specific gender-case article transition frequently affect unrelated gender-case settings, with substantial overlap among the most affected neurons across settings. These results argue against a strictly rule-based encoding of German definite articles, indicating that models at least partly rely on memorized associations rather than abstract grammatical rules.