Scholar

Zhaowei Zhu

Google Scholar ID: YS8pSQoAAAAJ

Docta.ai; University of California, Santa Cruz

Machine learningData QualityLabel NoiseResponsible AI

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,600

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailzwzhu1995@gmail.com CVOpen ↗GitHubOpen ↗

Publications

11 items

Small-Margin Preferences Still Matter-If You Train Them Right

2026

Cited

OFFSIDE: Benchmarking Unlearning Misinformation in Multimodal Large Language Models

2025

Cited

The Structural Scalpel: Automated Contiguous Layer Pruning for Large Language Models

2025

Cited

LM-mixup: Text Data Augmentation via Language Model based Mixup

2025

Cited

SelectMix: Enhancing Label Noise Robustness through Targeted Sample Mixing

2025

Cited

Better Reasoning with Less Data: Enhancing VLMs Through Unified Modality Scoring

2025

Cited

Evaluating LLM-corrupted Crowdsourcing Data Without Ground Truth

2025

Cited

GUARD: Generation-time LLM Unlearning via Adaptive Restriction and Detection

2025

Cited

Resume (English only)

Academic Achievements

ICML 2023: Studied fairness evaluation using weak proxy models without ground-truth sensitive attributes; collaborators include Kevin Yuanshun Yao, Jiankai Sun, Hang Li, and Yang Liu
ICLR 2023: Investigated how self-supervised learning features benefit learning with noisy labels; collaborators include Hao Cheng, Xing Sun, and Yang Liu
ICML 2022: Proposed SimiRep, a training-free method for noisy label detection; collaborators include Zihao Dong and Yang Liu
ICML 2022: Addressed failures of noise transition matrix estimators in non-vision tasks
Served as Area Chair for KDD 2025 Research Track (August 2024)
Led development of Docta, an open-source data health platform offering text data cleaning APIs for preference pairs, pairwise scores, and individual text scores

Background

Currently a researcher at Docta.ai
Research focuses on data-centric AI, large language models (LLMs)
Advancing responsible, explainable, and trustworthy AI
Particularly interested in weakly-supervised learning (including label noise, semi-supervised, and self-supervised learning)
Works on fairness in machine learning, federated learning, and addressing biases in data and algorithms

Co-authors

39 total