Yihao Zhang
Scholar

Yihao Zhang

Google Scholar ID: 9lALkz8AAAAJ
Peking University
AI SafetyFormal MethodMechanistic Interpretability
Citations & Impact
All-time
Citations
139
 
H-index
7
 
i10-index
6
 
Publications
14
 
Co-authors
7
list available
Resume (English only)
Academic Achievements
  • Paper 'Boosting jailbreak attack with momentum' accepted as Oral at ICASSP 2025
  • Two papers accepted at ICASSP 2024 (first author and second-to-last author, respectively)
  • Paper 'Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models' accepted at NeurIPS 2024
  • One paper accepted at SETTA 2024 (second author)
  • Paper 'On the Duality Between Sharpness-Aware Minimization and Adversarial Training' accepted at ICML 2024
  • Two papers accepted at ICLR 2024 R2-FM Workshop (first author and second-to-last author, respectively)
  • Paper 'MedTiny: Enhanced Mediator Modeling Language for Scalable Parallel Algorithms' accepted at QRS-C 2023
  • Paper 'Sharpness-Aware Minimization Alone can Improve Adversarial Robustness' accepted at AdvML-Frontiers@ICML 2023
  • Undergraduate thesis 'Automata Extraction from Transformers' posted on arXiv
  • Awarded the Beijing Natural Science Foundation Undergraduate 'Initiating Research' Program (2023)
Background
  • First-year PhD student in Applied Mathematics at the School of Mathematical Sciences, Peking University
  • Research interests include: Safety, Interpretability, and Social Value of LLM-based Agents (current focus)
  • Mechanistic Interpretability for Large Language Models (current focus)
  • Causality in AI, Formalization and Verification of Causality-Related Issues (current focus)
  • Large Language Model Alignment and Trustworthy LLMs
  • Representation Engineering in LLMs
  • AI Safety, verification of robustness/fairness/trustworthiness in AI systems
  • Automated Interactive Theorem Proving (AI4ITP)
  • Formal Methods, Model Checking, Software Analysis, Program Verification
  • Formalizing and verifying quantum computation and quantum AI systems
  • Testing technologies for AI systems