Scholar
Marcus Williams
Google Scholar ID: q3atvBMAAAAJ
OpenAI
AI safety
Alignment
Reward hacking
Deception
Follow
Google Scholar
↗
Citations & Impact
All-time
Citations
48
H-index
3
i10-index
1
Publications
6
Co-authors
0
Contact
No contact links provided.
Publications
4 items
Monitoring Monitorability
2025
Cited
0
CTRL-Rec: Controlling Recommender Systems With Natural Language
2025
Cited
0
Stress Testing Deliberative Alignment for Anti-Scheming Training
2025
Cited
0
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
arXiv.org · 2024
Cited
1
Resume (English only)
Co-authors
0 total
Co-authors: 0 (list not available)
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up