Scholar

Hadas Orgad

Google Scholar ID: xWntyLkAAAAJ

PhD student, Technion

natural language processingdeep learningfairnessrobustnessexplainability

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

855

H-index

i10-index

Publications

Co-authors

Contact

TwitterOpen ↗GitHubOpen ↗

Publications

9 items

Interpretability Can Be Actionable

2026

Cited

Hidden Failures in Robustness: Why Supervised Uncertainty Quantification Needs Better Evaluation

2026

Cited

Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism

2026

Cited

Agents of Chaos

2026

Cited

MIB: A Mechanistic Interpretability Benchmark

2025

Cited

Inside-Out: Hidden Factual Knowledge in LLMs

2025

Cited

Position-aware Automatic Circuit Discovery

2025

Cited

Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models

2025

Cited

Resume (English only)

Academic Achievements

Published several papers, including 'Inside-out: Hidden Factual Knowledge in LLMs' (COLM 2025), 'LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations' (ICLR 2025), 'MIB: A Mechanistic Interpretability Benchmark' (ICML 2025), and more.

Research Experience

Currently a Research Fellow at the Kempner Institute for the Study of Natural and Artificial Intelligence at Harvard University. Previously worked at Microsoft for 3.5 years on AI solutions for cloud security and NLP applications for security-related problems.

Education

Completed B.Sc. and M.Sc. degrees at the Technion – Israel Institute of Technology and Ph.D. under the supervision of Yonatan Belinkov. Selected as a 2023 Apple Scholar in AI/ML.

Background

Research interests include the internals of AI, particularly how interpretability can be used to improve robustness, safety, and trustworthiness. Research problems involve hallucinations, bias, and unsafe outputs.

Miscellany

Passionate about AI interpretability and open to connecting with others for brainstorming or collaboration.

Co-authors

0 total

Co-authors: 0 (list not available)