1. Emergence and scaling laws in SGD learning of shallow neural networks. 2. Learning Orthogonal Multi-Index Models: A Fine-Grained Information Exponent Analysis. 3. Learning and Transferring Sparse Contextual Bigrams with Linear Transformers. 4. On the Importance of Contrastive Loss in Multimodal Learning. 5. Depth-Separation with Multilayer Mean-Field Networks. 6. Understanding Deflation Process in Over-parametrized Tensor Decomposition.
Research Experience
Conducting research under the supervision of Prof. Jason D. Lee while pursuing a Ph.D. at Princeton University.
Education
B.S. in Computer Science from Shanghai Jiao Tong University, member of SJTU ACM Honors Class; M.S. in Machine Learning from Carnegie Mellon University, worked with Prof. Rong Ge and Prof. Yuanzhi Li during that time.
Background
Research interests: Deep Learning Theory, Applied Probability. Profile: A third-year Ph.D. student in the ECE department at Princeton University.
Miscellany
Some short notes I wrote during course projects or when learning new topics: Mirror Descent and Spectral Sparsification; Stochastic Localization and Lee and Vempala’s result on the KLS conjecture.