Scholar

Yonatan Belinkov

Google Scholar ID: K-6ujU4AAAAJ

Technion

Natural Language ProcessingModel InterpretabilityArtificial Intelligence

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

17,711

H-index

i10-index

103

Publications

Co-authors

list available

Contact

Emailbelinkov@technion.ac.il CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

45 items

Differentiable Faithfulness Alignment for Cross-Model Circuit Transfer

2026

Cited

Reasoning Models Know What's Important, and Encode It in Their Activations

2026

Cited

From Feelings to Metrics: Understanding and Formalizing How Users Vibe-Test LLMs

2026

Cited

Masked by Consensus: Disentangling Privileged Knowledge in LLM Correctness

2026

Cited

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

2026

Cited

Pitfalls in Evaluating Interpretability Agents

2026

Cited

Between the Layers Lies the Truth: Uncertainty Estimation in LLMs Using Intra-Layer Local Information Scores

2026

Cited

Induction Meets Biology: Mechanisms of Repeat Detection in Protein Language Models

2026

Cited

Resume (English only)

Academic Achievements

- ICLR 2025: Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
- ACL 2024: Diffusion Lens: Interpreting Text Encoders in Text-to-Image Pipelines
- ACL 2023: BLIND: Bias Removal With No Demographics
- NeurIPS 2022: Locating and Editing Factual Associations in GPT
- NeurIPS 2020: Investigating Gender Bias in Language Models Using Causal Mediation Analysis
- ACL 2019: Don't Take the Premise for Granted: Mitigating Artifacts in Natural Language Inference
- ICLR 2018: Synthetic and Natural Noise Both Break Neural Machine Translation
- NIPS 2017: Analyzing Hidden Representations in End-to-End Automatic Speech Recognition Systems
- ACL 2017: What do Neural Machine Translation Models Learn about Morphology?

Research Experience

- Faculty member at Technion Taub Faculty of Computer Science
- Postdoc at Harvard SEAS, working with Stuart Shieber, affiliated with the Mind, Brain, Behavior initiative
- Worked with the NLP group and CCNLab at Harvard
- Worked with the Spoken Language Systems group at MIT CSAIL
- Software engineer at IntuView

Education

- PhD: Massachusetts Institute of Technology (MIT) CSAIL, Advisor: James Glass, Thesis: Analyzed internal language representations in deep learning models, with particular applications to machine translation and speech recognition
- Master's: Tel Aviv University, Arabic Studies, Thesis: The Arabic dialect of Jisir izZarga

Background

- Research interests: Artificial Intelligence and Machine Learning, especially Large Language Models and other Natural Language Processing models
- Main research areas: Interpretability, Robustness, safety, and controllability, Emergent and multi-agent communication, Biological language models, NLP for Hebrew and Arabic

Miscellany