Alexander Panfilov
Scholar

Alexander Panfilov

Google Scholar ID: M65_TPEAAAAJ
ELLIS Institute Tübingen, Max Planck Institute for Intelligent Systems
Machine LearningTrustworthy MLAI Safety
Citations & Impact
All-time
Citations
82
 
H-index
5
 
i10-index
3
 
Publications
12
 
Co-authors
8
list available
Resume (English only)
Academic Achievements
  • Publications: 'Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols', 'Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM', 'Capability-Based Scaling Laws for LLM Red-Teaming', 'An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks'. Conference presentations: Presented at various international conferences such as ICML 2025, ICLR 2025, NeurIPS 2024, etc.
Research Experience
  • Research projects: Focusing on jailbreaking attacks on LLMs, including threat models, effectiveness of safety jailbreaks, and more.
Education
  • Degree: PhD; Institution: ELLIS Institute Tübingen / Max Planck Institute for Intelligent Systems; Start date: May 1, 2024.
Background
  • Research interests: adversarial robustness, AI safety, and ML security. Bio: I am a second-year ELLIS / IMPRS-IS PhD student, advised by Jonas Geiping and Maksym Andriushchenko.
Miscellany
  • Personal interests: Enjoy finding various ways to break machine learning systems.