Scholar
Dylan Hadfield-Menell
Google Scholar ID: 4mVPFQ8AAAAJ
Massachusetts Institute of Technology
Artificial Intelligence
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
5,039
H-index
34
i10-index
56
Publications
20
Co-authors
28
list available
Contact
CV
Open ↗
Twitter
Open ↗
GitHub
Open ↗
LinkedIn
Open ↗
Publications
17 items
Alignment Tampering: How Reinforcement Learning from Human Feedback Is Exploited to Optimize Misaligned Biases
2026
Cited
0
Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport
2026
Cited
0
Safety Drift After Fine-Tuning: Evidence from High-Stakes Domains
2026
Cited
0
The Prosocial Ranking Challenge: Reducing Polarization on Social Media without Sacrificing Engagement
2026
Cited
0
Prompt Injection as Role Confusion
2026
Cited
0
Surgical Activation Steering via Generative Causal Mediation
2026
Cited
0
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia
2025
Cited
2
Open-Universe Assistance Games
2025
Cited
0
Load more
Resume (English only)
Background
Associate Professor of EECS at MIT
Head of the Algorithmic Alignment Group at CSAIL
Research focuses on AI alignment—ensuring AI systems behave in accordance with human and societal values
Works on alignment challenges in multi-agent systems, human-AI teams, and societal oversight of machine learning
Aims to enable safe, beneficial, and trustworthy real-world deployment of AI
Co-authors
28 total
Anca D Dragan
Assistant Professor at UC Berkeley // Director, AI Safety and Alignment, Google DeepMind
Stephen Casper
PhD student, MIT
Co-author 3
Pieter Abbeel
UC Berkeley | Covariant
Gillian K. Hadfield
Johns Hopkins University, Dept of Computer Science and School of Government and Policy
Andreas Haupt
Stanford University
Smitha Milli
Meta FAIR
Thomas L. Griffiths
Professor of Psychology and Computer Science, Princeton University
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up