Scholar
Mrinank Sharma
Google Scholar ID: 5gslw-MAAAAJ
Anthropic
AI Safety
Machine Learning
Artificial Intelligence
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
3,611
H-index
18
i10-index
24
Publications
20
Co-authors
10
list available
Contact
No contact links provided.
Publications
9 items
Who's in Charge? Disempowerment Patterns in Real-World LLM Usage
2026
Cited
0
Eliciting Harmful Capabilities by Fine-Tuning On Safeguarded Outputs
2026
Cited
3
Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks
arXiv.org · 2026
Cited
2
Chain-of-Thought Hijacking
2025
Cited
0
Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks
2025
Cited
0
Forecasting Rare Language Model Behaviors
2025
Cited
0
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
2025
Cited
0
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
arXiv.org · 2024
Cited
8
Load more
Resume (English only)
Co-authors
10 total
Ethan Perez
Anthropic
Samuel R. Bowman
Anthropic and NYU
David Duvenaud
Associate Professor, University of Toronto
Tom Rainforth
Associate Professor, University of Oxford
Evan Hubinger
Member of Technical Staff, Anthropic
Erik Jones
UC Berkeley
Jesse Mu
Anthropic
Buck Shlegeris
CEO, Redwood Research
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up