- "Understanding Refusal in Language Models with Sparse Autoencoders" (Preprint, 2025)
- "Debiasing CLIP: Interpreting and Correcting Bias in Attention Heads" (Preprint, 2025)
- "A comprehensive review on financial explainable AI" (AIRE Journal, 2025)
- "Self-training Large Language Models through Knowledge Detection" (EMNLP, 2024)
- "How Interpretable are Reasoning Explanations from Prompting Large Language Models?" (NAACL, 2024)
- "Plausible Extractive Rationalization through Semi-Supervised Entailment Signal" (ACL, 2024)
Research Experience
- PhD Research Project: Focuses on the intersection of NLP and interpretability, especially in the application to AI safety
- Position: PhD Student
Education
- Degree: PhD
- University: Nanyang Technological University (NTU), Singapore
- Advisor: Prof. Erik Cambria
- Time: Expected to graduate by the end of 2025
- Major: Artificial Intelligence
Background
- Research Interests: Natural Language Processing and Interpretability, AI Safety
- Professional Field: AI research, particularly improving the understanding of how AI systems model complex behaviors
- Introduction: Currently a PhD student at Nanyang Technological University, Singapore, focusing on using interpretability to improve AI safety issues such as jailbreak or prompt injection attacks