AgoraResearch hub
ExploreLibraryProfile
Account
Mrinank Sharma
Scholar

Mrinank Sharma

Google Scholar ID: 5gslw-MAAAAJ
Anthropic
AI SafetyMachine LearningArtificial Intelligence
Homepage↗Google Scholar↗
Citations & Impact
All-time
Citations
3,611
 
H-index
18
 
i10-index
24
 
Publications
20
 
Co-authors
10
list available
Contact
No contact links provided.
Publications
9 items
Who's in Charge? Disempowerment Patterns in Real-World LLM Usage
2026
Cited
0
Eliciting Harmful Capabilities by Fine-Tuning On Safeguarded Outputs
2026
Cited
3
Constitutional Classifiers++: Efficient Production-Grade Defenses against Universal Jailbreaks
arXiv.org · 2026
Cited
2
Chain-of-Thought Hijacking
2025
Cited
0
Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks
2025
Cited
0
Forecasting Rare Language Model Behaviors
2025
Cited
0
Constitutional Classifiers: Defending against Universal Jailbreaks across Thousands of Hours of Red Teaming
2025
Cited
0
PoisonBench: Assessing Large Language Model Vulnerability to Data Poisoning
arXiv.org · 2024
Cited
8
Resume (English only)
Co-authors
10 total
Ethan Perez
Ethan Perez
Anthropic
Samuel R. Bowman
Samuel R. Bowman
Anthropic and NYU
David Duvenaud
David Duvenaud
Associate Professor, University of Toronto
Tom Rainforth
Tom Rainforth
Associate Professor, University of Oxford
Evan Hubinger
Evan Hubinger
Member of Technical Staff, Anthropic
Erik Jones
Erik Jones
UC Berkeley
Jesse Mu
Jesse Mu
Anthropic
Buck Shlegeris
Buck Shlegeris
CEO, Redwood Research

Welcome back

Sign in to Agora

Welcome back! Please sign in to continue.

Do not have an account?