Sahar Abdelnabi
Scholar

Sahar Abdelnabi

Google Scholar ID: QEiYbDYAAAAJ
AI Security Researcher, Microsoft
AI SecurityAI SafetyAdversarial Machine LearningLLMs
Citations & Impact
All-time
Citations
2,034
 
H-index
12
 
i10-index
14
 
Publications
20
 
Co-authors
35
list available
Resume (English only)
Academic Achievements
  • First to identify, coin, and taxonomize the indirect prompt injection vulnerability in LLM-integrated applications in 2023; proposed and called for watermarking generative AI for language and vision in 2020; work on LLM sampling heuristics received a Best Paper Award at ACL2025.
Research Experience
  • Previously an AI security researcher at Microsoft. Currently leading the COMPASS research group, focusing on safe, aligned, and steerable AI agents. Research areas include understanding, probing, and evaluating the failure modes of AI models, their biases, emergent risks, and misuse scenarios; designing mitigations, system defenses, white-box control methods, and reasoning enhancements to counter such risks; leveraging AI agents for good: scientific discovery and advancing our society.
Education
  • Completed a PhD at CISPA Helmholtz Center for Information Security, advised by Prof. Dr. Mario Fritz; obtained an MSc degree at Saarland University.
Background
  • A Principal Investigator at the ELLIS Institute Tübingen and an independent research group leader at the Max-Planck Institute for Intelligent Systems and Tübingen AI Center, leading the COMPASS research group focused on developing safe, aligned, and steerable AI agents with emphasis on security, human aspects, and cooperative multi-agent systems. Research interests include the broad intersection of AI with security, safety, and sociopolitical aspects.
Miscellany
  • Open to broad topics on A(G)I safety and security, interpretability, reasoning, evals, contextual integrity, agentic risks and opportunities, multi-agent dynamics, agents with long-term memory, self-improving agents, (deceptive) alignment, situational awareness, manipulation and deception.