Zhengxuan Wu
Scholar

Zhengxuan Wu

Google Scholar ID: CBvE6lwAAAAJ
Stanford University
natural language processingmechanistic interpretability
Citations & Impact
All-time
Citations
2,661
 
H-index
25
 
i10-index
33
 
Publications
20
 
Co-authors
8
list available
Resume (English only)
Academic Achievements
  • Improved representation steering for language models, NeurIPS 2025 (Spotlight), *equal contribution
  • AxBench: Steering LLMs? Even simple baselines outperform sparse autoencoders, ICML 2025 (Spotlight), *equal contribution
  • ReFT: Representation finetuning for language models, NeurIPS 2024 (Spotlight), *equal contribution
  • pyvene: A library for understanding and improving PyTorch models via interventions, NAACL 2024
  • Interpretability at scale: Identifying causal mechanisms in Alpaca, NeurIPS 2023, *equal contribution