Scholar
Dylan Hadfield-Menell
Google Scholar ID: 4mVPFQ8AAAAJ
Massachusetts Institute of Technology
Artificial Intelligence
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
5,039
H-index
34
i10-index
56
Publications
20
Co-authors
28
list available
Contact
CV
Open ↗
Twitter
Open ↗
GitHub
Open ↗
LinkedIn
Open ↗
Publications
14 items
The Prosocial Ranking Challenge: Reducing Polarization on Social Media without Sacrificing Engagement
2026
Cited
0
Prompt Injection as Role Confusion
2026
Cited
0
Surgical Activation Steering via Generative Causal Mediation
2026
Cited
0
Evaluating Generalization Capabilities of LLM-Based Agents in Mixed-Motive Scenarios Using Concordia
2025
Cited
2
Open-Universe Assistance Games
2025
Cited
0
CALMA: A Process for Deriving Context-aligned Axes for Language Model Alignment
2025
Cited
0
Layered Unlearning for Adversarial Relearning
2025
Cited
0
Randomness, Not Representation: The Unreliability of Evaluating Cultural Alignment in LLMs
2025
Cited
0
Load more
Resume (English only)
Background
Associate Professor of EECS at MIT
Head of the Algorithmic Alignment Group at CSAIL
Research focuses on AI alignment—ensuring AI systems behave in accordance with human and societal values
Works on alignment challenges in multi-agent systems, human-AI teams, and societal oversight of machine learning
Aims to enable safe, beneficial, and trustworthy real-world deployment of AI
Co-authors
28 total
Anca D Dragan
Assistant Professor at UC Berkeley // Director, AI Safety and Alignment, Google DeepMind
Stephen Casper
PhD student, MIT
Co-author 3
Pieter Abbeel
UC Berkeley | Covariant
Gillian K. Hadfield
Johns Hopkins University, Dept of Computer Science and School of Government and Policy
Andreas Haupt
Stanford University
Smitha Milli
Meta FAIR
Thomas L. Griffiths
Professor of Psychology and Computer Science, Princeton University
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up