Hengyi Cai
Scholar

Hengyi Cai

Google Scholar ID: Kz-r34UAAAAJ
Institute of Computing Technology, Chinese Academy of Sciences
Natural Language Processing
Citations & Impact
All-time
Citations
1,566
 
H-index
12
 
i10-index
17
 
Publications
20
 
Co-authors
10
list available
Resume (English only)
Academic Achievements
  • Multiple publications in top-tier venues such as ACL, EMNLP, NeurIPS, KDD, and SIGIR. Selected publications include:
  • - MARA: A Multimodal Adaptive Retrieval-Augmented Framework for Document Question Answering
  • - Towards AI Search Paradigm
  • - Enhancing Retrieval-Augmented Generation via Evidence Tree Search
  • - Multi-Agent Proactive Information Seeking with Adaptive LLM Orchestration for Non-Factoid Question Answering
  • - Tool Learning with Large Language Models: A Survey
  • - From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions
  • - From Prompting to Alignment: A Generative Framework for Query Recommendation
  • - PA-RAG: RAG Alignment via Multi-Perspective Preference Optimization
  • - Explainability for Large Language Models: A Survey
  • - Towards Completeness-Oriented Tool Retrieval for Large Language Models
  • - AdaSwitch: Adaptive Switching between Small and Large Agents for Effective Cloud-Local Collaborative Learning
  • - Towards Verifiable Text Generation with Evolving Memory and Self-Reflection
  • - Cross-model Control: Improving Multiple Large Language Models in One-time Training
  • - Text-Video Retrieval via Multi-Modal Hypergraph Networks
Research Experience
  • Currently a Research Scientist at Baidu, leading the Baidu Search Foundation Model Team. Focuses on developing an efficient foundation model system for search applications.
Education
  • Ph.D. from the Institute of Computing Technology, Chinese Academy of Sciences in 2021, graduating as an Outstanding Graduate.
Background
  • Research Scientist at Baidu Inc., leading the Baidu Search Foundation Model Team. Specializes in the development and optimization of large-scale language models, particularly in MoE sparsification strategies, pre-training task design, post-training optimization, and reinforcement learning-based reasoning enhancement. His research spans text generation, question answering, and retrieval-augmented language models.
Miscellany
  • Actively serves on the program committees of leading conferences.