Scholar

Zidi Xiong

Google Scholar ID: XL6QafwAAAAJ

Harvard University

Trustworthy machine learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,013

H-index

i10-index

Publications

Co-authors

Contact

GitHubOpen ↗

Publications

8 items

Monitorability as a Free Gift: How RLVR Spontaneously Aligns Reasoning

2026

Cited

Proof of Time: A Benchmark for Evaluating Scientific Idea Judgments

2026

Cited

User-Assistant Bias in LLMs

2025

Cited

When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy

2025

Cited

How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior

2025

Cited

Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models

2025

Cited

MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models

2025

Cited

GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

Publications:
- NeurIPS 2025: Measuring the Faithfulness of Thinking Drafts in Large Reasoning Models
- EMNLP 2025 Findings: When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy
- ICML 2025: GuardAgent: Safeguard LLM Agent by a Guard Agent via Knowledge-Enabled Reasoning
- ICLR 2025: MMDT: Decoding the Trustworthiness and Safety of Multimodal Foundation Models
- ICML 2024: RigorLLM: Resilient Guardrails for Large Language Models against Undesired Content
- ICLR 2024: BadChain: Backdoor Chain-of-Thought Prompting for Large Language Models
- NeurIPS 2023 BUGS Workshop: Oral Presentation
- NeurIPS 2023: DECODINGTRUST: A Comprehensive Assessment of Trustworthiness in GPT Models (Outstanding Paper Award)
- NeurIPS 2023: CBD: A Certified Backdoor Detector Based on Local Dominant Probability
- ICML 2023: UMD: Unsupervised Model Detection for X2X Backdoor Attacks
- ICLR 2023 BANDS Workshop: Rethinking the Necessity of Labels in Backdoor Removal
Preprints:
- How Memory Management Impacts LLM Agents: An Empirical Study of Experience-Following Behavior
- Label-Smoothed Backdoor Attack
Awards:
- NeurIPS 2023 Outstanding Paper Award

Research Experience

Baidu Research, CCL Lab - Research Intern 2021.11-2022.06, supervised by Dr. Minlong Peng.

Education

Harvard University - Ph.D. in Computer Science 2024 - 2029 (expected)
University of Illinois Urbana-Champaign - B.S. in Mathematics and Computer Science 2019 - 2023, advised by Prof. Bo Li.

Background

Research Interests: Trustworthy Machine Learning. Background: Ph.D. student in Computer Science at Harvard University, advised by Prof. Hima Lakkaraju.

Co-authors

0 total

Co-authors: 0 (list not available)