- Ladder Residual: Redefining Tensor Parallelism in Transformers for Accelerated Inference (ICML 2025)
- FloE: On-the-Fly MoE Inference on Memory-constrained GPU (ICML 2025)
- Improving Model Alignment Through Collective Intelligence of Open-Source Models (ICML 2025)
- Mixture-of-Agents Enhances Large Language Model Capabilities (ICLR 2025)
- Scaling Instruction-tuned LLMs to Million-token Contexts via Hierarchical Synthetic Data Generation (ICLR 2025)
- Train Small, Infer Large: Memory-Efficient LoRA Training for Large Language Models (ICLR 2025)
Research Experience
- Together AI, Senior Staff Researcher, May 2025 - Present
- Together AI, Staff Researcher, July 2023 - May 2025
- Rokid, Research Intern, June 2018 - September 2018
Education
- Zhejiang University, Ph.D. in Computer Science, Sep 2018 - Jun 2023, Advisor: Prof. Lidan Shou
- ETH Zurich, Academic Guest, Mar 2021 - Sep 2021
- Université Paris Saclay (CentraleSupélec), Master (Engineer) in General Engineering, Sep 2016 - Jun 2018
- Zhejiang University, Bachelor in Electrical Engineering, Sep 2014 - Jun 2018
Background
Currently a senior staff researcher at Together AI, working closely with Prof. Ce Zhang. Research interests mainly focus on efficient and cost-effective algorithms and systems for large language models (LLMs).