Scholar

Catherine Arnett

Google Scholar ID: gIDJdFAAAAAJ

Researcher, EleutherAI

NLPmultilingual NLPcomputational linguistcs

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

160

H-index

i10-index

Publications

Co-authors

list available

Contact

TwitterOpen ↗GitHubOpen ↗

Publications

12 items

Position: Don't Just "Fix it in Post": A Science of AI Must Study Training Dynamics

2026

Cited

Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs

2026

Cited

How Open Must Language Models be to Enable Reliable Scientific Inference?

2026

Cited

Weight Tying Biases Token Embeddings Towards the Output Space

2026

Cited

CommonLID: Re-evaluating State-of-the-Art Language Identification Performance on Web Data

2026

Cited

Global PIQA: Evaluating Physical Commonsense Reasoning Across 100+ Languages and Cultures

2025

Cited

Disaggregation Reveals Hidden Training Dynamics: The Case of Agreement Attraction

2025

Cited

Explaining and Mitigating Crosslingual Tokenizer Inequities

2025

Cited

Resume (English only)

Research Experience

Was the Lead Research Scientist at PleIAs; currently an NLP Researcher at EleutherAI.

Education

Received a PhD in Linguistics with a specialization in Computational Social Science from UC San Diego.

Background

An NLP Researcher with an interest in cross-lingual and multilingual NLP.

Miscellany

Can be found on platforms like Twitter and BlueSky; has a new blog post out.

Co-authors

8 total

Benjamin Bergen

Professor of Cognitive Science, UC San Diego

Tyler A. Chang

Google DeepMind

James A. Michaelov

Massachusetts Institute of Technology

Ivan P. Yamshchikov

Research Professor at CAIRO, THWS

Eliot Jones

Head of Offensive Cybersecurity, Gray Swan AI

Pavel Chizhov

Researcher and Ph.D. Student at cairo.thws

Sean Trott

Assistant Teaching Professor, UC San Diego

Co-author 8