Scholar

Wei Jie Yeo

Google Scholar ID: DcUMc_IAAAAJ

PhD candidate, Nanyang Technological University

Natural Language ProcessingExplainable AI

Citations & Impact

All-time

Citations

172

H-index

i10-index

Publications

Co-authors

Contact

Publications

3 items

2025

Cited

2025

Cited

2025

Cited

Resume (English only)

Academic Achievements

- Publications:
- "Understanding Refusal in Language Models with Sparse Autoencoders" (Preprint, 2025)
- "Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads" (Preprint, 2025)
- "A comprehensive review on financial explainable AI" (AIRE Journal, 2025)
- "Self-training Large Language Models through Knowledge Detection" (EMNLP, 2024)
- "How Interpretable are Reasoning Explanations from Prompting Large Language Models?" (NAACL, 2024)
- "Plausible Extractive Rationalization through Semi-Supervised Entailment Signal" (ACL, 2024)

Research Experience

- PhD Research Project: Focuses on the intersection of NLP and interpretability, especially in the application to AI safety
- Position: PhD Student

Education

Background

- Research Interests: Natural Language Processing and Interpretability, AI Safety
- Professional Field: AI research, particularly improving the understanding of how AI systems model complex behaviors
- Introduction: Currently a PhD student at Nanyang Technological University, Singapore, focusing on using interpretability to improve AI safety issues such as jailbreak or prompt injection attacks

Miscellany

Co-authors

0 total

Co-authors: 0 (list not available)