Browse publications on Google Scholar (top-right) ↗
Resume (English only)
Academic Achievements
Introduced the stability-based view of flatness bias in SGD (NeurIPS 2018)
Demonstrated the critical role of anisotropic SGD noise in sharpness control (NeurIPS 2022)
Proved that flat minima generalize well for two-layer ReLU and diagonal linear networks (ICML 2023)
Developed stability-inspired algorithms for seeking flatter minima (NeurIPS 2024)
Discovered the Edge of Stability (EoS) phenomenon (NeurIPS 2018, Table 2)
Made significant contributions to approximation theory for machine learning, including Barron space theory (CMS 2019, Constr. Approx. 2021), deep CNNs (NeurIPS 2023), kernel and random features (arXiv 2024), embedding theorems (JML 2023), and duality between approximation and estimation (AoS 2025)
Designed optimizer improvements: AdmIRE & Blockwise LR (amplify dynamics along flat directions), GradPower (single-line code change to enhance gradient informativeness)
Background
Currently an Assistant Professor at the School of Mathematical Sciences and Center for Machine Learning Research, Peking University
Research focuses on understanding the mechanisms behind the success of deep learning, particularly on:
- Approximation and representation power of neural networks
- Dynamical behavior of optimization algorithms such as SGD and Adam
- Emergent phenomena in the training of large language models (LLMs)