Scholar

Zhengxuan Wu

Google Scholar ID: CBvE6lwAAAAJ

Stanford University

natural language processingmechanistic interpretability

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,661

H-index

25

i10-index

33

Publications

20

Co-authors

8

list available

Contact

Emailwuzhengx@cs.stanford.edu CVOpen ↗TwitterOpen ↗GitHubOpen ↗

Publications

10 items

The Piggyback Hypothesis of Generalization: Explaining and Mitigating Emergent Misalignment

2026

Cited

0

PreFT: Prefill-only finetuning for efficient inference

2026

Cited

0

ADAG: Automatically Describing Attribution Graphs

2026

Cited

0

Language Model Circuits Are Sparse in the Neuron Basis

2026

Cited

0

LLMs Encode Harmfulness and Refusal Separately

2025

Cited

0

HyperSteer: Activation Steering at Scale with Hypernetworks

2025

Cited

0

Improved Representation Steering for Language Models

2025

Cited

0

GIM: Improved Interpretability for Large Language Models

2025

Cited

0

Resume (English only)

Academic Achievements

Improved representation steering for language models, NeurIPS 2025 (Spotlight), *equal contribution
AxBench: Steering LLMs? Even simple baselines outperform sparse autoencoders, ICML 2025 (Spotlight), *equal contribution
ReFT: Representation finetuning for language models, NeurIPS 2024 (Spotlight), *equal contribution
pyvene: A library for understanding and improving PyTorch models via interventions, NAACL 2024
Interpretability at scale: Identifying causal mechanisms in Alpaca, NeurIPS 2023, *equal contribution

Co-authors

8 total

Christopher Potts

Professor of Linguistics and, by courtesy, of Computer Science

Christopher D Manning

Professor of Computer Science and Linguistics, Stanford University

Noah D. Goodman

Stanford University

C.I. Lewis Professor of Philosophy and Professor of Computer Science (courtesy), Stanford University

Assistant Professor of Psychology, The University of Texas at Austin

Stanford University

Professor of Linguistics and Computer Science, Stanford University

Pr(Ai)²R Group