Scholar
Francis Rhys Ward
Google Scholar ID: i98avZYAAAAJ
Imperial College London
AI alignment
deception
safety evaluations
Follow
Homepage
↗
Google Scholar
↗
Citations & Impact
All-time
Citations
206
H-index
7
i10-index
5
Publications
17
Co-authors
15
list available
Contact
No contact links provided.
Publications
8 items
How does information access affect LLM monitors'ability to detect sabotage?
2026
Cited
0
Password-Activated Shutdown Protocols for Misaligned Frontier Agents
2025
Cited
0
Reasoning Under Pressure: How do Training Incentives Influence Chain-of-Thought Monitorability?
2025
Cited
0
CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D
2025
Cited
0
Higher-Order Belief in Incomplete Information MAIDs
2025
Cited
0
The Elicitation Game: Evaluating Capability Elicitation Techniques
2025
Cited
0
Towards a Theory of AI Personhood
2025
Cited
0
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
arXiv.org · 2024
Cited
11
Resume (English only)
Co-authors
15 total
Francesco Belardinelli
Imperial College London
Francesca Toni
Imperial College London
Teun van der Weij
Research Scientist
Tom Everitt
Staff Research Scientist at Google DeepMind
Samuel F. Brown
Unknown affiliation
Matt MacDermott
Imperial College London/Mila/LawZero
Ibrahim Habli
Professor of Safety-Critical Systems at the University of York
Loic Le Folgoc
Associate Professor, LTCI, Télécom Paris, France
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up