Scholar
David Dobre
Google Scholar ID: 5CZ6nlYAAAAJ
PhD student, University of Montreal/Mila
deep learning
adversarial robustness
in-context learning
reasoning
mechanistic interpretability
Follow
Google Scholar
↗
Citations & Impact
All-time
Citations
320
H-index
9
i10-index
8
Publications
15
Co-authors
0
Contact
No contact links provided.
Publications
3 items
A generative approach to LLM harmfulness detection with special red flag tokens
2025
Cited
0
Learning diverse attacks on large language models for robust red-teaming and safety tuning
arXiv.org · 2024
Cited
9
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Neural Information Processing Systems · 2024
Cited
36
Resume (English only)
Co-authors
0 total
Co-authors: 0 (list not available)
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up