- Publication: Rethinking Conventional Wisdom in Machine Learning: from Generalization to Scaling
- Publication: Scaling Exponents Across Parameterizations and Optimizers
- Publication: 4+3 Phases of Compute-Optimal Neural Scaling Laws
- Publication: Small-scale proxies for large-scale Transformer training instabilities
- Publication: Synergy and symmetry in deep learning: Interactions between the data, model, and inference algorithm
- Publication: Precise Learning Curves and Higher-Order Scalings for Dot-product Kernel Regression
- Publication: Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks
- Publication: Fast Neural Kernel Embeddings for General Activations
- Publication: Dataset Distillation with Infinitely Wide Convolutional Networks
- Publication: Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit
- Publication: Finite Versus Infinite Neural Networks: an Empirical Study
- Publication: The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks
- Publication: Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks
- Publication: Disentangling Trainability and Generalization in Deep Neural Networks
- Publication: Neural Tangents: Fast and Easy Infinite Neural Networks in Python
- Publication: Wide Neural Networts of Any Depth Evolve As Linear Models Under Gradients Descent
- Publication: Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes
- Publication: Dynamical isometry and a mean field theory of CNNs: How to train 10,000-layer vanilla convolutional neural networks
Research Experience
- Research Scientist: Google DeepMind (legacy Google Brain), NYC
- Hans Rademacher Instructor of Mathematics: University of Pennsylvania
Education
- PhD: University of Illinois at Urbana-Champaign
- BA: Zhejiang University, Hangzhou, China
Background
Research interests include scaling-centric machine learning, deep learning theory, generalization, optimization, training dynamics, kernels, Gaussian processes, etc. In his previous research, he also worked on harmonic analysis.