Sunny Sanyal
Scholar

Sunny Sanyal

Google Scholar ID: xx9rrGMAAAAJ
PhD ECE, University of Texas at Austin
Machine LearningLanguage Models
Citations & Impact
All-time
Citations
297
 
H-index
7
 
i10-index
5
 
Publications
14
 
Co-authors
6
list available
Resume (English only)
Academic Achievements
  • Paper 'When Attention Collapses: How Degenerate Layers in LLMs Enable Smaller, Stronger Models' was featured in a popular media portal. [ICML'25 Spotlight🏆] 'Upweighting Easy Samples in Fine-Tuning Mitigates Forgetting' selected as a Spotlight Poster at ICML 2025, placing it among the top 2.6% of all 12,107 submissions. [COLM'24] 'Early Weight Averaging Meets High Learning Rates for LLM Pre-training' also presented at NeurIPS 2023 WANT workshop, inspiring follow-up efforts at companies like eBay, Bytedance, Alibaba, and Huggingface. [Neurips'24 Dataset Track] 'DataComp-LM: In search of the next generation of language model training sets' inspired Apple's DCLM-7B model.
Research Experience
  • Student Researcher, Foundation Research team at Google Deepmind (May-Aug 2025) | Worked on two different projects on recurrence in language models. Research Intern at Lightning AI (May-Aug 2024) | Topic: Efficient Fine-tuning and Continual training of LLM. Applied science Intern at Amazon Science Alexa (May-Aug 2022) | Topic: Vision Language pre-training and finetuning.
Education
  • Graduated with an M.Eng. degree in Information and Communication Engineering from Chongqing University of Posts and Telecommunications, Chongqing, China in 2019. Received a B.Tech degree in Electronics and Communication Engineering from Maulana Abul Kalam Azad University of Technology (formerly West Bengal University of Technology), Kolkata, India.
Background
  • Research Interests: Foundation models, Transformer++ architecture, Efficient Pre-training and Knowledge Distillation. Overview: Currently a PhD student at the University of Texas at Austin, advised by Prof. Sujay Sanghavi in the Department of Electrical and Computer Engineering. Focused on efficient training strategies for large models (mostly LLMs). Some recent works have been featured in Ahead of AI magazine, Marktechpost, and the Interconnects newsletters.
Miscellany
  • Person who stutters. Currently on the job market, seeking both industry and postdoctoral positions.