First to identify, coin, and taxonomize the indirect prompt injection vulnerability in LLM-integrated applications in 2023; proposed and called for watermarking generative AI for language and vision in 2020; work on LLM sampling heuristics received a Best Paper Award at ACL2025.
Research Experience
Previously an AI security researcher at Microsoft. Currently leading the COMPASS research group, focusing on safe, aligned, and steerable AI agents. Research areas include understanding, probing, and evaluating the failure modes of AI models, their biases, emergent risks, and misuse scenarios; designing mitigations, system defenses, white-box control methods, and reasoning enhancements to counter such risks; leveraging AI agents for good: scientific discovery and advancing our society.
Education
Completed a PhD at CISPA Helmholtz Center for Information Security, advised by Prof. Dr. Mario Fritz; obtained an MSc degree at Saarland University.
Background
A Principal Investigator at the ELLIS Institute Tübingen and an independent research group leader at the Max-Planck Institute for Intelligent Systems and Tübingen AI Center, leading the COMPASS research group focused on developing safe, aligned, and steerable AI agents with emphasis on security, human aspects, and cooperative multi-agent systems. Research interests include the broad intersection of AI with security, safety, and sociopolitical aspects.
Miscellany
Open to broad topics on A(G)I safety and security, interpretability, reasoning, evals, contextual integrity, agentic risks and opportunities, multi-agent dynamics, agents with long-term memory, self-improving agents, (deceptive) alignment, situational awareness, manipulation and deception.