- NeurIPS 2025: Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
- EMNLP 2025 Findings: When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy
- ICML 2025: GuardAgent: Safeguard LLM Agent by a Guard Agent via Knowledge-Enabled Reasoning
- ICLR 2025: MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
- ICML 2024: RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
- ICLR 2024: BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
- NeurIPS 2023 BUGS Workshop: Oral Presentation
- NeurIPS 2023: DECODINGTRUST: A Comprehensive Assessment of Trustworthiness in GPT Models (Outstanding Paper Award)
- NeurIPS 2023: CBD: A Certified Backdoor Detector Based on Local Dominant Probability
- ICML 2023: UMD: Unsupervised Model Detection for X2X Backdoor Attacks
- ICLR 2023 BANDS Workshop: Rethinking the Necessity of Labels in Backdoor Removal
Preprints:
- How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior
- Label-Smoothed Backdoor Attack
Awards:
- NeurIPS 2023 Outstanding Paper Award
Research Experience
Baidu Research, CCL Lab - Research Intern 2021.11-2022.06, supervised by Dr. Minlong Peng.
Education
Harvard University - Ph.D. in Computer Science 2024 - 2029 (expected)
University of Illinois Urbana-Champaign - B.S. in Mathematics and Computer Science 2019 - 2023, advised by Prof. Bo Li.
Background
Research Interests: Trustworthy Machine Learning. Background: Ph.D. student in Computer Science at Harvard University, advised by Prof. Hima Lakkaraju.