Scholar

Alexander Panfilov

Google Scholar ID: M65_TPEAAAAJ

ELLIS Institute Tübingen, Max Planck Institute for Intelligent Systems

Machine LearningTrustworthy MLAI Safety

Citations & Impact

All-time

Citations

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

6 items

2026

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

2024

Cited

Resume (English only)

Academic Achievements

Publications: 'Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols', 'Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM', 'Capability-Based Scaling Laws for LLM Red-Teaming', 'An Interpretable N-gram Perplexity Threat Model for Large Language Model Jailbreaks'. Conference presentations: Presented at various international conferences such as ICML 2025, ICLR 2025, NeurIPS 2024, etc.

Research Experience

Research projects: Focusing on jailbreaking attacks on LLMs, including threat models, effectiveness of safety jailbreaks, and more.

Education

Degree: PhD; Institution: ELLIS Institute Tübingen / Max Planck Institute for Intelligent Systems; Start date: May 1, 2024.

Background

Research interests: adversarial robustness, AI safety, and ML security. Bio: I am a second-year ELLIS / IMPRS-IS PhD student, advised by Jonas Geiping and Maksym Andriushchenko.

Miscellany