- Multiple papers accepted at top conferences like NeurIPS and ICLR
- Representative Publications: 'Beyond Linear Probes: Dynamic Safety Monitoring for Language Models', 'Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders', etc.
Research Experience
- 2025.10 - 2026.02: Postdoctoral Research Assistant, University of Oxford
- 2025.05 - 2025.09: Visiting Student, University of Oxford
- 2025.04 - 2025.09: Research Associate, AIGI Oxford
- 2024.09 - 2024.12: Honorary Associate, University of Wisconsin–Madison
- 2023.07 - 2024.01: Research Intern, Huawei Noah's Ark Lab
- 2021.09 - 2025.10: PhD Student, QMUL
- 2019.11 - 2020.09: Research Intern, The Cyprus Institute
Education
- PhD: Queen Mary University of London (submitted thesis in Oct 2025)
- Visiting Scholar: University of Wisconsin–Madison (timeline not specified)
- Visiting Student: University of Oxford (summer 2025)
Background
- Research Interests: AI safety and interpretability
- Professional Field: Interpretable and aligned machine learning models
- Brief Introduction: During his PhD, he focused on designing scalable ways to break down machine learning models' computations into parts that can be interpreted by humans, to better understand their behavior and steer them toward outcomes more aligned with our values. Currently, he is also thinking about how to build better defense mechanisms for LLM safety.
Miscellany
- Teaching Experience: Served as a teaching assistant for multiple modules, including AI Safety and Alignment, Deep Learning and Computer Vision, etc.
- Invited Talks: Delivered a talk on Tensor Decompositions in Large Scale Deep Learning at Archimedes Research Unit in June 2024