Scholar
Leon Lang
Google Scholar ID: E3ae_sMAAAAJ
PhD Student, University of Amsterdam
AI Safety and Alignment
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
407
H-index
5
i10-index
4
Publications
14
Co-authors
23
list available
Contact
Email
l.lang@uva.nl
CV
Open ↗
Twitter
Open ↗
GitHub
Open ↗
LinkedIn
Open ↗
Publications
3 items
Modeling Human Beliefs about AI Behavior for Scalable Oversight
2025
Cited
0
The Perils of Optimizing Learned Reward Functions: Low Training Error Does Not Guarantee Low Regret
arXiv.org · 2024
Cited
1
Information Decomposition Diagrams Applied beyond Shannon Entropy: A Generalization of Hu's Theorem
arXiv.org · 2022
Cited
4
Resume (English only)
Academic Achievements
Published multiple papers on AI alignment, risks of optimizing learned reward functions, and partial observability challenges in RLHF.
Proposed a general method to build E(N)-equivariant steerable CNNs based on the Wigner-Eckart theorem.
Generalized Hu's Theorem for information decomposition beyond Shannon entropy, including Kolmogorov complexity and generalization error.
Developed factored space models as a new foundation for causality across abstraction levels.
Researched modeling human beliefs about AI behavior to improve scalable oversight.
Co-authors
23 total
Maurice Weiler
University of Amsterdam
Gabriele Cesa
University of Amsterdam, Qualcomm AI Research
Erik Jenner
Google DeepMind
Patrick Forré
Associate Professor of Stochastics, University of Amsterdam
Co-author 5
Anca D Dragan
Assistant Professor at UC Berkeley // Director, AI Safety and Alignment, Google DeepMind
Scott Emmons
Google DeepMind
Co-author 8
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up