Scholar

James Oldfield

Google Scholar ID: h5NoWGQAAAAJ

Queen Mary University of London

InterpretabilityMachine LearningGenerative Models

Citations & Impact

All-time

Citations

417

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

5 items

2026

Cited

2026

Cited

2026

Cited

2025

Cited

2025

Cited

Resume (English only)

Academic Achievements

- Top Reviewer at NeurIPS 2025
- Top Reviewer at NeurIPS 2024
- Multiple papers accepted at top conferences like NeurIPS and ICLR
- Representative Publications: 'Beyond Linear Probes: Dynamic Safety Monitoring for Language Models', 'Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders', etc.

Research Experience

Education

Background

- Research Interests: AI safety and interpretability
- Professional Field: Interpretable and aligned machine learning models
- Brief Introduction: During his PhD, he focused on designing scalable ways to break down machine learning models' computations into parts that can be interpreted by humans, to better understand their behavior and steer them toward outcomes more aligned with our values. Currently, he is also thinking about how to build better defense mechanisms for LLM safety.

Miscellany

- Teaching Experience: Served as a teaching assistant for multiple modules, including AI Safety and Alignment, Deep Learning and Computer Vision, etc.
- Invited Talks: Delivered a talk on Tensor Decompositions in Large Scale Deep Learning at Archimedes Research Unit in June 2024

Co-authors

11 total