Preprints: 'BabyBabelLM: A Multilingual Benchmark of Developmentally Plausible Training Data'; Conference/Workshop Papers: 'Dialogue is not enough to make a communicative BabyLM (but neither is developmentally inspired reinforcement learning)', 'Do Construction Distributions Shape Formal Language Learning In German BabyLMs?', 'Subword models struggle with word learning, but surprisal hides it', 'Small language models also work with small vocabularies: Probing the linguistic abilities of grapheme- and phoneme-based baby llamas'.
Research Experience
Working in the Computational Linguistics group at Bielefeld University; member of CRC 1646 – Linguistic Creativity in Communication; helped develop the corpus annotation tool Hexatomic and worked at the English department at Jena University.
Education
PhD: Bielefeld University, Computational Linguistics group (CLAUSE), supervised by Prof. Sina Zarrieß; Master/Bachelor: Friedrich Schiller University Jena in English/American Studies and Computer Science, Katholieke Universiteit Leuven in Belgium.
Background
Research interests: The relationship between linguistics (especially usage-based and cognitive approaches) and natural language processing. Current research focus is on small language models and their comparability to child language development, particularly from a multilingual perspective.