Scholar

Yihao Zhang

Google Scholar ID: 9lALkz8AAAAJ

Peking University

AI SafetyFormal MethodMechanistic Interpretability

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

139

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailzhangyihao@stu.pku.edu.cn GitHubOpen ↗

Publications

18 items

Symbolic-Neural Soft-Logic Reasoning: Towards Robust and Verifiable Thinking Chains via Cooperative Evolution

2026

Cited

VOW: Verifiable and Oblivious Watermark Detection for Large Language Models

2026

Cited

The Salami Slicing Threat: Exploiting Cumulative Risks in LLM Systems

2026

Cited

ClawWorm: Self-Propagating Attacks Across LLM Agent Ecosystems

2026

Cited

RACA: Representation-Aware Coverage Criteria for LLM Safety Testing

2026

Cited

CBA: Communication-Bound-Aware Cross-Domain Resource Assignment for Pipeline-Parallel Distributed LLM Training in Dynamic Multi-DC Optical Networks

2025

Cited

FAIRY2I: Universal Extremely-Low Bit QAT framework via Widely-Linear Representation and Phase-Aware Quantization

2025

Cited

Experiences from Benchmarking Vision-Language-Action Models for Robotic Manipulation

2025

Cited

Resume (English only)

Academic Achievements

Paper 'Boosting jailbreak attack with momentum' accepted as Oral at ICASSP 2025
Two papers accepted at ICASSP 2024 (first author and second-to-last author, respectively)
Paper 'Adversarial Representation Engineering: A General Model Editing Framework for Large Language Models' accepted at NeurIPS 2024
One paper accepted at SETTA 2024 (second author)
Paper 'On the Duality Between Sharpness-Aware Minimization and Adversarial Training' accepted at ICML 2024
Two papers accepted at ICLR 2024 R2-FM Workshop (first author and second-to-last author, respectively)
Paper 'MedTiny: Enhanced Mediator Modeling Language for Scalable Parallel Algorithms' accepted at QRS-C 2023
Paper 'Sharpness-Aware Minimization Alone can Improve Adversarial Robustness' accepted at AdvML-Frontiers@ICML 2023
Undergraduate thesis 'Automata Extraction from Transformers' posted on arXiv
Awarded the Beijing Natural Science Foundation Undergraduate 'Initiating Research' Program (2023)

Background

First-year PhD student in Applied Mathematics at the School of Mathematical Sciences, Peking University
Research interests include: Safety, Interpretability, and Social Value of LLM-based Agents (current focus)
Mechanistic Interpretability for Large Language Models (current focus)
Causality in AI, Formalization and Verification of Causality-Related Issues (current focus)
Large Language Model Alignment and Trustworthy LLMs
Representation Engineering in LLMs
AI Safety, verification of robustness/fairness/trustworthiness in AI systems
Automated Interactive Theorem Proving (AI4ITP)
Formal Methods, Model Checking, Software Analysis, Program Verification
Formalizing and verifying quantum computation and quantum AI systems
Testing technologies for AI systems

Co-authors

7 total