Problem
Research questions and friction points this paper is trying to address.
Models language acquisition from birth data
Provides multilingual developmentally plausible training datasets
Facilitates cognitive modeling and multilingual pretraining
Innovation
Methods, ideas, or system contributions that make the work stand out.
Multilingual developmentally plausible pretraining data curation
45 languages with 100M English word equivalents
Evaluation suites and baseline models for cognitive modeling