AgoraResearch hub
ExploreLibraryProfile
Account
Francis Rhys Ward
Scholar

Francis Rhys Ward

Google Scholar ID: i98avZYAAAAJ
Imperial College London
AI alignmentdeceptionsafety evaluations
Homepage↗Google Scholar↗
Citations & Impact
All-time
Citations
206
 
H-index
7
 
i10-index
5
 
Publications
17
 
Co-authors
15
list available
Contact
No contact links provided.
Publications
8 items
How does information access affect LLM monitors'ability to detect sabotage?
2026
Cited
0
Password-Activated Shutdown Protocols for Misaligned Frontier Agents
2025
Cited
0
Reasoning Under Pressure: How do Training Incentives Influence Chain-of-Thought Monitorability?
2025
Cited
0
CTRL-ALT-DECEIT: Sabotage Evaluations for Automated AI R&D
2025
Cited
0
Higher-Order Belief in Incomplete Information MAIDs
2025
Cited
0
The Elicitation Game: Evaluating Capability Elicitation Techniques
2025
Cited
0
Towards a Theory of AI Personhood
2025
Cited
0
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
arXiv.org · 2024
Cited
11
Resume (English only)
Co-authors
15 total
Francesco Belardinelli
Francesco Belardinelli
Imperial College London
Francesca Toni
Francesca Toni
Imperial College London
Teun van der Weij
Teun van der Weij
Research Scientist
Tom Everitt
Tom Everitt
Staff Research Scientist at Google DeepMind
Samuel F. Brown
Samuel F. Brown
Unknown affiliation
Matt MacDermott
Matt MacDermott
Imperial College London/Mila/LawZero
Ibrahim Habli
Ibrahim Habli
Professor of Safety-Critical Systems at the University of York
Loic Le Folgoc
Loic Le Folgoc
Associate Professor, LTCI, Télécom Paris, France

Welcome back

Sign in to Agora

Welcome back! Please sign in to continue.

Do not have an account?