ICML 2023: Studied fairness evaluation using weak proxy models without ground-truth sensitive attributes; collaborators include Kevin Yuanshun Yao, Jiankai Sun, Hang Li, and Yang Liu
ICLR 2023: Investigated how self-supervised learning features benefit learning with noisy labels; collaborators include Hao Cheng, Xing Sun, and Yang Liu
ICML 2022: Proposed SimiRep, a training-free method for noisy label detection; collaborators include Zihao Dong and Yang Liu
ICML 2022: Addressed failures of noise transition matrix estimators in non-vision tasks
Served as Area Chair for KDD 2025 Research Track (August 2024)
Led development of Docta, an open-source data health platform offering text data cleaning APIs for preference pairs, pairwise scores, and individual text scores
Background
Currently a researcher at Docta.ai
Research focuses on data-centric AI, large language models (LLMs)
Advancing responsible, explainable, and trustworthy AI
Particularly interested in weakly-supervised learning (including label noise, semi-supervised, and self-supervised learning)
Works on fairness in machine learning, federated learning, and addressing biases in data and algorithms