Scholar

Sahar Abdelnabi

Google Scholar ID: QEiYbDYAAAAJ

AI Security Researcher, Microsoft

AI SecurityAI SafetyAdversarial Machine LearningLLMs

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,034

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailsahar.abdelnabi@tue.ellis.eu CVOpen ↗TwitterOpen ↗GitHubOpen ↗

Publications

22 items

Models That Know How Evaluations Are Designed Score Safer

2026

Cited

Measuring Security Without Fooling Ourselves: Why Benchmarking Agents Is Hard

2026

Cited

Decomposing and Measuring Evaluation Awareness

2026

Cited

AI Agents May Always Fall for Prompt Injections

2026

Cited

Hidden in Memory: Sleeper Memory Poisoning in LLM Agents

2026

Cited

No More, No Less: Task Alignment in Terminal Agents

2026

Cited

Skill-Inject: Measuring Agent Vulnerability to Skill File Attacks

2026

Cited

Colosseum: Auditing Collusion in Cooperative Multi-Agent Systems

2026

Cited

Resume (English only)

Academic Achievements

First to identify, coin, and taxonomize the indirect prompt injection vulnerability in LLM-integrated applications in 2023; proposed and called for watermarking generative AI for language and vision in 2020; work on LLM sampling heuristics received a Best Paper Award at ACL2025.

Research Experience

Previously an AI security researcher at Microsoft. Currently leading the COMPASS research group, focusing on safe, aligned, and steerable AI agents. Research areas include understanding, probing, and evaluating the failure modes of AI models, their biases, emergent risks, and misuse scenarios; designing mitigations, system defenses, white-box control methods, and reasoning enhancements to counter such risks; leveraging AI agents for good: scientific discovery and advancing our society.

Education

Completed a PhD at CISPA Helmholtz Center for Information Security, advised by Prof. Dr. Mario Fritz; obtained an MSc degree at Saarland University.

Background

A Principal Investigator at the ELLIS Institute Tübingen and an independent research group leader at the Max-Planck Institute for Intelligent Systems and Tübingen AI Center, leading the COMPASS research group focused on developing safe, aligned, and steerable AI agents with emphasis on security, human aspects, and cooperative multi-agent systems. Research interests include the broad intersection of AI with security, safety, and sociopolitical aspects.

Miscellany

Open to broad topics on A(G)I safety and security, interpretability, reasoning, evals, contextual integrity, agentic risks and opportunities, multi-agent dynamics, agents with long-term memory, self-improving agents, (deceptive) alignment, situational awareness, manipulation and deception.

Co-authors

35 total