Scholar

Aengus Lynch

Google Scholar ID: Pd2002AAAAAJ

University College London

AI alignment

Citations & Impact

All-time

Citations

1,042

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

6 items

2026

Cited

2026

Cited

2025

Cited

2025

Cited

2024

Cited

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

- Published paper 'Agentic Misalignment: How LLMs Could be Insider Threats' (2025)
- Contributed to 'Best-of-N Jailbreaking' (2024)
- Worked on 'Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs' (2024)
- Analyzed the generalization and reliability of steering vectors (2024)
- Explored eight methods to evaluate robust unlearning in LLMs (2024)
- Researched towards automated circuit discovery for mechanistic interpretability (2023), which was a spotlight at NeurIPS 2023
- Developed Spawrious: A benchmark for fine control of spurious correlation biases (2023)
- Wrote a survey on causal machine learning and open problems (2022)

Research Experience

Education

PhD student at UCL, supervised by Stephen Casper. Specific degree, major, and time period not provided.

Background

Research interests include AI alignment, mechanistic interpretability, and AI safety. He works on finding and fixing ways AI systems can fail, particularly in preventing AI systems from engaging in harmful behaviors.

Miscellany