Scholar
Catherine Arnett
Google Scholar ID: gIDJdFAAAAAJ
Researcher, EleutherAI
NLP
multilingual NLP
computational linguistcs
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
160
H-index
7
i10-index
4
Publications
16
Co-authors
8
list available
Contact
Twitter
Open ↗
GitHub
Open ↗
Publications
10 items
How Open Must Language Models be to Enable Reliable Scientific Inference?
2026
Cited
0
Weight Tying Biases Token Embeddings Towards the Output Space
2026
Cited
0
CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data
2026
Cited
0
Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures
2025
Cited
0
Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction
2025
Cited
0
Explaining and Mitigating Crosslingual Tokenizer Inequities
2025
Cited
0
Evaluating Morphological Alignment of Tokenizers in 70 Languages
2025
Cited
0
Common Corpus: The Largest Collection of Ethical Data for LLM Pre-Training
2025
Cited
0
Load more
Resume (English only)
Research Experience
Was the Lead Research Scientist at PleIAs; currently an NLP Researcher at EleutherAI.
Education
Received a PhD in Linguistics with a specialization in Computational Social Science from UC San Diego.
Background
An NLP Researcher with an interest in cross-lingual and multilingual NLP.
Miscellany
Can be found on platforms like Twitter and BlueSky; has a new blog post out.
Co-authors
8 total
Benjamin Bergen
Professor of Cognitive Science, UC San Diego
Tyler A. Chang
Google DeepMind
James A. Michaelov
Massachusetts Institute of Technology
Ivan P. Yamshchikov
Research Professor at CAIRO, THWS
Eliot Jones
Head of Offensive Cybersecurity, Gray Swan AI
Pavel Chizhov
Researcher and Ph.D. Student at cairo.thws
Sean Trott
Assistant Teaching Professor, UC San Diego
Co-author 8
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up