Weiting Tan
Scholar

Weiting Tan

Google Scholar ID: hD8E4gYAAAAJ
Johns Hopkins University, Bytedance Seed
Natural Language ProcessingMulti-modal LLMMultilingual LLM
Citations & Impact
All-time
Citations
451
 
H-index
5
 
i10-index
3
 
Publications
17
 
Co-authors
12
list available
Resume (English only)
Academic Achievements
  • Sep 25, 2025: Process-Supervised Reinforcement Learning for Interactive Multimodal Tool-Use Agents released on arXiv.
  • Aug 20, 2025: Seeing is Believing: Emotion-Aware Audio-Visual Language Modeling for Expressive Speech Generation accepted to EMNLP 2025!
  • May 20, 2025: STAR and SSR accepted to IWSLT 2025!
  • Sep 25, 2024: DiffNorm accepted to NeurIPS 2024!
  • Paper SSR: Alignment-Aware Modality Connector for Speech Language Models proposes a method to better fuse modalities by segmenting and compressing speech features.
  • Paper DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation introduces a diffusion-based normalization strategy to simplify data distributions.
Research Experience
  • Recently focused on allowing multi-modal agents to seamlessly reason, utilize tools, and interact with users/envs. Also working on unified multimodal understanding and generation LLMs.
Education
  • Currently a PhD Candidate in Computer Science at Johns Hopkins University, advised by Prof. Philipp Koehn. Previously, completed Undergraduate and Master’s degrees in Computer Science at JHU.
Background
  • Research interests: machine learning and natural language processing; particularly excited by efficient and scalable representation learning methods for cross-modal applications; also interested in derivatives pricing and hedging strategies.
Miscellany
  • Email: wtan12 at jhu.edu