Scholar
Zhengxuan Wu
Google Scholar ID: CBvE6lwAAAAJ
Stanford University
natural language processing
mechanistic interpretability
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
2,661
H-index
25
i10-index
33
Publications
20
Co-authors
8
list available
Contact
Email
wuzhengx@cs.stanford.edu
CV
Open ↗
Twitter
Open ↗
GitHub
Open ↗
Publications
8 items
ADAG: Automatically Describing Attribution Graphs
2026
Cited
0
Language Model Circuits Are Sparse in the Neuron Basis
2026
Cited
0
LLMs Encode Harmfulness and Refusal Separately
2025
Cited
0
HyperSteer: Activation Steering at Scale with Hypernetworks
2025
Cited
0
Improved Representation Steering for Language Models
2025
Cited
0
GIM: Improved Interpretability for Large Language Models
2025
Cited
0
AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders
2025
Cited
0
Causal Abstraction: A Theoretical Foundation for Mechanistic Interpretability
2023
Cited
49
Resume (English only)
Academic Achievements
Improved representation steering for language models, NeurIPS 2025 (Spotlight), *equal contribution
AxBench: Steering LLMs? Even simple baselines outperform sparse autoencoders, ICML 2025 (Spotlight), *equal contribution
ReFT: Representation finetuning for language models, NeurIPS 2024 (Spotlight), *equal contribution
pyvene: A library for understanding and improving PyTorch models via interventions, NAACL 2024
Interpretability at scale: Identifying causal mechanisms in Alpaca, NeurIPS 2023, *equal contribution
Co-authors
8 total
Christopher Potts
Professor of Linguistics and, by courtesy, of Computer Science
Christopher D Manning
Professor of Computer Science and Linguistics, Stanford University
Noah D. Goodman
Stanford University
Thomas Icard
C.I. Lewis Professor of Philosophy and Professor of Computer Science (courtesy), Stanford University
Desmond C. Ong
Assistant Professor of Psychology, The University of Texas at Austin
Aryaman Arora
Stanford University
Dan Jurafsky
Professor of Linguistics and Computer Science, Stanford University
Atticus Geiger
Pr(Ai)²R Group
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up